- Runtime Optimization in DISC Systems
Data intensive scalable computing (DISC) systems, such as Apache Hadoop or Spark, allow to process large amounts of heterogenous data. In the context of such DISC systems, we research how to reduce the overall runtime of data processing under a variety of system characteristics (e.g., systems with multiple concurrent jobs, systems with provenance capture enabled, ...).
- Debugging Declarative Data Processing
Using declarative languages such as SQL to specify data processing, developers often face the problem that they cannot properly inspect or debug their query or transformation code. All they see is the tip of the iceberg once the result data is computed. If it does not comply with the developers’ expectation, they usually perform one or more tedious and mostly manual analyze-fix-test cycles until the expected result occurs. The goal of our research is to support developers in this process by providing a suite of algorithms and tools to accompany the process.