Data Engineering

Institute for Parallel and Distributed Systems (IPVS)

Prof. Dr. rer. nat. Melanie Herschel

Members of the Data Engineering group at IPVS, February 2020.

Latest news

  • 05/2024: Paper accepted at ICML 2024
  • 05/2024: Paper accepted to the Datenbank-Spektrum
  • 02/2024: Demonstration accepted at ICDE 2024
  • 03/2024: Larissa C. Shimomura joining our team
  • 02/2024: Paper accepted at PAKDD 2024

Read more...

The field of Data Engineering encompasses technologies related to processing and transforming any kind of data into a useful format for further analysis. These data may for instance be structured data from enterprise databases, semi- or unstructured Web data, or streaming data in the context of the Internet of Things (IoT). The data engineering group in Stuttgart works on various steps of data engineering with the overarching goal to automatically, transparently, and responsibly refine data from its raw state into a state ready for use in various data analytics and data exploration applications.

Currently, we are particularly interested in algorithms and tools for data annotation, data cleaning, and data integration as well as foundations and practical implementations of provenance management to trace complex data engineering processes. Another research focus of the group are languages, algorithms, and tools that support users in complex data processing through data exploration or process analysis solutions. Finally, we study data management techniques to empower fair, accountable, and transparent data analysis.

Below is a list of selected recent publications involving at least one author of the IPVS DE group. A full list is available on our publications page. 

2024

Exploiting Negative Samples: A Catalyst for Cohort Discovery in Healthcare Analytics
Kaiping Zheng, Horng-Ruey Chua, Melanie Herschel, H. V. Jagadish, Beng Chin Ooi, James Wei Luen Yip
International Conference on Machine Learning (ICML), Vienna, Austria, 2024 (accepted)

FairCR - an evaluation and recommendation system for fair classification algorithms
Nico Lässig, Melanie Herschel
IEEE International Conference on Data Engineering (ICDE), Utrecht, Netherlands, 2024 (accepted)

Knowledge-Infused Optimization for Parameter Selection in Numerical Simulations
Julia Meißner, Dominik Göddeke, Melanie Herschel
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Taipei, Taiwan, 2024

FALCC: Efficiently performing locally fair and accurate classifications
Nico Lässig, Melanie Herschel
International Conference on Extending Database Technology (EDBT), Paestum, Italy, 2024

2023

Progressive Entity Resolution over Incremental Data. 
Leonardo Gazzarri, Melanie Herschel
International Conference on Extending Database Technology (EDBT), Ioannina, Greece, 2023

Towards an AutoML System for Fair Classifications.
Nico Lässig
IEEE International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 2023

2022

DyHealth: Making Neural Networks Dynamic for Effective Healthcare Analytics.
Kaiping Zheng, Shaofeng Cai, Horng Ruey Chua, Melanie Herschel, Meihui Zhang, Beng Chin Ooi
Proceedings of the VLDB Endowment (PVLDB), 15(12), 2022

Metrics and Algorithms for Locally Fair and Accurate Classifications using Ensembles. 
Nico Lässig, Sarah Oppold, Melanie Herschel.
Datenbank-Spektrum 22(1), 2022

2021

To Not Miss the Forest for the Trees - A Holistic Approach for Explaining Missing Answers over Nested Data
Ralf Diestelkämper, Seokki Lee, Melanie Herschel, Boris Glavic
ACM International Conference on the Management of Data (SIGMOD),  Xi'an, Shaanxi, China, 2021

PACE: Learning Effective Task Decomposition for Human-in-the-loop Healthcare Delivery
Kaiping Zheng, Gang Chen, Melanie Herschel, Kee Yuan Herschel, Beng Chin Ooi, Jinyang Gao
ACM International Conference on the Management of Data (SIGMOD),  Xi'an, Shaanxi, China, 2021

End-to-end Task Based Parallelization for Entity Resolution on Dynamic Data
Leonardo Gazzarri, Melanie Herschel
IEEE International Conference on Data Engineering (ICDE), Chania, Crete, Greece, 2021

Using FALCES against bias in automated decisions by integrating fairness in dynamic model ensembles
Nico Lässig, Sarah Oppold, Melanie Herschel
Database Systems for Business, Technology, and Web (BTW), 2021

Collaborative filtering over evolution provenance data for interactive visual data exploration
Houssem Ben Lahmar, Melanie Herschel
Information Systems, 95, 101620, 2021

2020

Distributed Tree-Pattern Matching in Big Data Analytics Systems
Ralf Diestelkämper, Melanie Herschel
In Proceedings of the Conference on Advances in Databases and Information Systems (ADBIS), Lyon, France, 2020

Towards task-based parallelization for entity resolution
Leonardo Gazzarri, Melanie Herschel 
SICS Software-Intensive Cyber-Physical Systems, 35(1), 2020

Accountable Data Analytics Start with Accountable Data: The LiQuID Metadata Model
Sarah Oppold, Melanie Herschel
ER Forum, Demo and Posters 2020 Co-Located with International Conference on Conceptual Modeling (ER), 2020

A System Framework for Personalized and Transparent Data-Driven Decisions
Sarah Oppold, Melanie Herschel
International Conference on Advanced Information Systems Engineering (CAISE), Grenoble, France, 2020

Tracing nested data with structural provenance for big data analytics
Ralf Diestelkämper, Melanie Herschel
International Conference on Extending Database Technology (EDBT), Copenhagen, Denmark, 2020

Boosting Blocking Performance in Entity Resolution Pipelines: Comparison Cleaning using Bloom Filters
Leonardo Gazzarri, Melanie Herschel
International Conference on Extending Database Technology (EDBT), Copenhagen, Denmark, 2020

2019

LuPe: A System for Personalized and Transparent Data-driven Decisions
Sarah Oppold, Melanie Herschel
International Conference on Information and Knowledge Management (CIKM), Beijing, China, 2019

Towards Integrating Collaborative Filtering in Visual Data Exploration Systems
Houssem Ben Lahmar and Melanie Herschel
European Conference on Advances in Databases and Information Systems (ADBIS), Bled, Slovenia, 2019

Capturing and querying structural provenance in Spark with Pebble
Ralf Diestankämper, Melanie Herschel
ACM SIG Conference on the Management of Data (SIGMOD), Amsterdam, The Netherlands, 2019

Volume-based large dynamic graph analysis supported by evolution provenance
Valentin Bruder, Houssem Ben Lahmar, Marcel Hlawatsch, Steffen Frey, Michael Burch, Daniel Weiskopf, Melanie Herschel, Thomas Ertl
Multimedia Tools and Applications, Vol. 78, No. 23, 2019

Query-based Why-not Explanations for Nested Data
Ralf Diestelkämper, Boris Glavic, Melanie Herschel, Seokki Lee
Workshop on Theory and Practice of Provenance (TaPP), Philadelphia, PA, USA, 2019

Structural summaries for visual provenance analysis
Houssem Ben Lahmar, Melanie Herschel
Workshop on Theory and Practice of Provenance (TaPP), Philadelphia, PA, USA, 2019

Advances in Database Technology - 22nd International Conference on Extending Database Technology, EDBT 2019, Lisbon, Portugal, March 26-29, 2019, Proceedings
Melanie Herschel, Helena Galhardas, Berthold Reinwald, Irini Fundulaki, Carsten Binnig, Zoi Kaoudi
OpenProceedings.org 2019, ISBN 978-3-89318-081-3

Prediction of air pollution with machine learning
Christian Schmitz, Dhiren Devinder Serai, Tatiane Escobar Gava
Datenbanksysteme für Business, Technologie und Web (BTW 2019), 18. Fachtagung des GI- Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-8. März 2019, Rostock, Germany, Workshops

Our group contributes to the curricula of the different study programs at bachelor and master level offered by the department of computer science by offering lectures, seminars, projects, and thesis topics in the broad area of data management, data engineering, and data science. 

Contact

This image shows Melanie Herschel

Melanie Herschel

Prof. Dr. rer. nat.

Head of Institute

This image shows Eva Strähle

Eva Strähle

M.A.

Secretary

To the top of the page