Complex Data Processing

As the amount and variety of available data constantly increase, their processing and management require novel data processing techniques and systems. In recent year, various data management systems have emerged to cope with such data. In the context of such systems, we contribute both to methods for scalable complex data processing execution and to methods that facilitate the design and development of complex data processing programs.

Runtime Optimization in DISC Systems

Data intensive scalable computing (DISC) systems, such as Apache Hadoop or Spark, allow to process large amounts of heterogenous data. In the context of such DISC systems, we research how to reduce the overall runtime of data processing under a variety of system characteristics (e.g., systems with multiple concurrent jobs, systems with provenance capture enabled, ...).

Debugging Declarative Data Processing

Using declarative languages such as SQL to specify data processing, developers often face the problem that they cannot properly inspect or debug their query or transformation code. All they see is the tip of the iceberg once the result data is computed. If it does not comply with the developers’ expectation, they usually perform one or more tedious and mostly manual analyze-fix-test cycles until the expected result occurs. The goal of our research is to support developers in this process by providing a suite of algorithms and tools to accompany the process.

To the top of the page