Complex Data Processing

As the amount and variety of available data constantly increase, their processing and management require novel data processing techniques and systems. In recent year, various data management systems have emerged to cope with such data. In the context of such systems, we contribute both to methods for scalable complex data processing execution and to methods that facilitate the design and development of complex data processing programs.

Execution-Plan optimization for Simulations

The systematic cost estimation of alternative query execution plans has a long tradition in query optimizers of database management systems. In simulations, which involve solving partial differential equations (PDEs), we also observe that there are  alternative schemes and implemntations to solve a PDE, which comprise different common steps that need to be well chosen and properly parameterized for a particular setting (defined by the available hardware, time constraints, etc.). We explore how concepts of query optimization can be brought to simulations to enable a more systematic selection of an adequate execution plan and good parameters compared to the current approach that typically relies on expert knowledge and experience. 

Runtime Optimization in DISC Systems

Data intensive scalable computing (DISC) systems, such as Apache Hadoop or Spark, allow to process large amounts of heterogenous data. In the context of such DISC systems, we research how to reduce the overall runtime of data processing under a variety of system characteristics (e.g., systems with multiple concurrent jobs, systems with provenance capture enabled, ...).

Debugging Declarative Data Processing

Using declarative languages such as SQL to specify data processing, developers often face the problem that they cannot properly inspect or debug their query or transformation code. All they see is the tip of the iceberg once the result data is computed. If it does not comply with the developers’ expectation, they usually perform one or more tedious and mostly manual analyze-fix-test cycles until the expected result occurs. The goal of our research is to support developers in this process by providing a suite of algorithms and tools to accompany the process.

To the top of the page