Our research in data wrangling focuses on methods, algorithms, and tools both for individual data wrangling steps (e.g., data selection, data cleansing, data integration, data provisioning) as well as full data wrangling pipelines that provide end-to-end processing of raw data to data used in data-driven analysis.
- Data integration and data cleaning
Our data integration research focuses on effective methods and scalable algorithms that facilitate the (semi-) automatic combination of heterogeneous data from various sources. This serves the overarching goal of giving a unified access to data where entities are represented in a complete, unique, and correct way. In this context, we have great expertise in the problems known as entity resolution and data fusion: while entity resolution recognizes the different representations of a real-world entity (e.g., coming from different sources), data fusion combines these different representations into a single entity.
The problems of entity resolution and data fusion equally apply in the context of data cleansing, as dedicated solutions help identify and eliminate redundant, incomplete, or incorrect data within a single data set.