Data Preprocessing for Fair Machine Learning

May 6, 2023

Paper accepted at AAAI/ACM Conference on AI, Ethics, and Society 2023 

 This research paper discusses the implications of different strategies for encoding protected categorical attributes – the most frequent type of sensitive information about people. In particular, data preprocessing strategies that encode intersectional attributes are often most effective for machine learning, but those intersectional attributes are also particularly prone to unfair discrimination.  

Intersectional attributes comprise several categories, e.g., combining gender and ethnicity or nationality and occupation. While not all women are necessarily discriminated against, or not all nations with a foreign passport, women of color or foreign nationals with blue-collar jobs might be treated unfairly at higher rates. 

This paper analyzes these situations and suggests remedies at the data preprocessing stage allowing for the default application of most successful machine learning algorithms. 

This research has been done as part of the NoBias project comprising researchers from University of Southampton, University of Pisa and University of Stuttgart.  

[1] Carlos Mougan, Jose Alvarez, Salvatore Ruggieri and Steffen Staab. Fairness implications of encoding protected categorical attributes. In: AAAI/ACM Conference on AI, Ethics, and Society 2023. August 8-10, 2023.  

Find a preprint here:

Please find a blog entry about the paper here: 

To the top of the page