SeaMap

Research project as part of Software Campus

Semantically Assisted Management of Analytical Data Platforms

Project Description

In view of the growing possibilities for data collection and the current advances in artificial intelligence and data analysis, data is becoming an increasingly important asset for companies in almost all sectors of the economy. By using data-driven analysis techniques, such as data mining and machine learning methods, companies can gather insights and knowledge from the data, which can then be used to optimize business processes and products. For this purpose, these companies must collect, process, and manage large amounts of data in a structured manner, for which various types of data platforms have been established in the past. Among the most modern representatives are so-called data lakehouses, which promise to combine the advantages of data warehouses and data lakes and have already found widespread use in industrial practice. However, the developments that have emerged in the advent of data lakehouses are largely limited to technical aspects, such as ensuring ACID properties on highly scalable storage systems, while comprehensive concepts and methods for supporting the operation and management of these data platforms are still lacking. Considering the increasing scope and complexity of these data platforms, particularly in terms of the number and heterogeneity of the data sets to be managed, the data processing pipelines, the technologies used, and the user groups involved with their different tasks and knowledge, this poses major challenges for companies.

The goal of this project is to develop and implement concepts that support the administration and operation of modern data platforms. For this, a semantic approach is pursued, in which information about the data platform, such as in terms of its data and technology architecture, the domain in which it is operated, the datasets that are managed and on it and their use, are collected in a common, holistic knowledge graph. Such a knowledge graph can then serve as a central access point for supporting and handling various activities on the data platform. This project investigates which actors and activities play a role in the management and operation of data platforms, how the knowledge graph can support these different user groups and how this graph must be structured for this purpose. Since activities on data platforms can vary greatly depending on their domain and application context and therefore must operate with different types of information, the developed concepts should put an emphasis on extensibility and modularity.

This image shows Jan Schneider

Jan Schneider

M.Sc.

Researcher

This image shows Holger Schwarz

Holger Schwarz

Prof. Dr. rer. nat.

Apl. Professor

This image shows Bernhard Mitschang

Bernhard Mitschang

Prof. Dr.-Ing. habil.

Head of Institute

To the top of the page