zur Startseite

Übung - Data Engineering (SS19)

Data Engineering
Dozent Prof. Dr. rer. nat. Melanie Herschel
M.Sc. Sarah Oppold
Umfang2V + 2Ü
Sprache Englisch
Studiengänge Informatik
Computer Science, Informatik, Softwaretechnik, u.a.
Zielgruppe Master, Diplom
TermineDetailed schedule announced during first lecture on April 10, 2019, 9:45 - 11:15 pm, room 0.463

In the era of big data, it is more important than ever to manage the available data such that it can be reasonably use. One major use case for instance is analysis of the data with the overall goal to gain new insights. Indeed, business analytics, data mining, or machine learning rely on a large data basis. However, the results of an analysis typically highly depend on the data they are based on, so it is crucial that these data are correct, meaningful, accessible, up to date, etc. The goal of data engineering is to ensure these qualities of the data. To this end, data engineering focuses on technologies, software, algorithms, and tools to support elevating raw data to information useful for further analysis.

This lecture covers both foundations and algorithms on selected topics of data engineering. These include:

  • Data collection: how do we find relevant data sources?
  • Big Data integration: Given the unique properties of big data, how can data from multiple data sources be combined to get a more global perspective on a subject to be analyzed?.
  • Data cleaning:How can important properties and errors of data be assessed and corrected?
  • Data distribution:What modern technologies support the wide dissemination of data?
  • Provenance: How can the whole data engineering process be documented, controlled, and improved leveraging so-called provenance meta-data?

Course material

During the semester, lecture slides and supplemental material will be made available for download in ILIAS.
Registration via Campus registration to the practicals.


The practicals include both exam-like exercises on paper as well as programming exercises on selected data engineering tasks. Details will be presented during the first practical.

Course Details and Schedule

  • Language: English.
  • Format: 2SWS lectures and 2 SWS practicals. The time slots and locations are as announced on Campus. However, the distribution of lectures and practicals will vary. The exact schedule will be announced during the first lecture.

The course requires prior knowledge equivalent to Introduction to Databases (SQL, relational model, functional dependences, e.g., course Modellierung); knowledge from the course Advanced Information Management is a plus.
There is no unique book covering all aspects of data engineering. The lecture is however significantly based on selected chapters of the following books.

Xin Luna Dong and Divesh Srivastava.Big Data Integration.
Synthesis Lectures on Data Management, Morgan an Claypool, 2015.

Wanfei Fan and Floris Geerts.Fondations of Data Quality Management.
Synthesis Lectures on Data Management, Morgan an Claypool, 2012.

Anhai Doan, Alon Halevy, and Zachary Ives. Principles of Data Integration.
Morgan Kaufmann, 2012.

James Cheney, Laura Chiticariu, and Wang Chiew Tan.Provenance in Databases: Why, How, and Where.
Foundations and Trends in Databases, Vol. 1, No.4, 2007.