COMBINE

Research project as part of Software Campus

Investigation of the influence of complex data characteristics on the creation and optimization of classifier ensembles.

Description

For companies today, analyzing data is the basis for making reliable decisions in their business processes. Classification models, a machine learning (ML) method, are often used for this purpose. These models make predictions about future events based on existing data.

In practice, many use cases have complex data characteristics that lead to challenges when training machine learning models.  One example of this is small amounts of data, where classification models often achieve low prediction accuracy, meaning that the models make a relatively high number of incorrect predictions. This is exacerbated by other data characteristics, such as the occurrence of outliers or noise in the data. To meet this challenge, several different classification models can be combined with each other. This combination of several classification models is generally referred to as a classification ensemble. Here, the combination of the models is carried out using a decision fusion method that fuses the predictions of the individual models into a joint prediction. However, an approach to optimize these ensembles specifically on the basis of the available data characteristics is still missing.

The aim of the microproject is to develop such an approach in order to increase their predictive accuracy. For this purpose, this project will investigate which data characteristics have a high influence on the creation of ensembles, as well as how these characteristics influence the selection of the different components of a ensemble (preprocessing, classification models, fusion). Based on these findings, the approach for creating optimized ensembles will then be developed.

The micro project COMBINE is carried out in the context of the Software Campus since 01.03.2024.

Industry Partner: Software-AG

To the top of the page