Zurück zur Übersicht

Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques

full text: PDF
author/s: Manh Khoi Duong, Stefan Conrad
type:Inproceedings
booktitle:Big Data Analytics and Knowledge Discovery: 26th International Conference, DaWaK 2024, Naples, Italy, August 26-28, 2024, Proceedings
publisher:Springer Nature
month:August
year:2024
location:Naples, Italy
Abstract

In this paper, we deal with bias mitigation techniques that remove specific data points from the training set to aim for a fair representation of the population in that set. Machine learning models are trained on these pre-processed datasets, and their predictions are expected to be fair. However, such approaches may exclude relevant data, making the attained subsets less trustworthy for further usage. To enhance the trustworthiness of prior methods, we propose additional requirements and objectives that the subsets must fulfill in addition to fairness: (1) group coverage, and (2) minimal data loss. While removing entire groups may improve the measured fairness, this practice is very problematic as failing to represent every group cannot be considered fair. In our second concern, we advocate for the retention of data while minimizing discrimination. By introducing a multi-objective optimization problem that considers fairness and data loss, we propose a methodology to find Pareto-optimal solutions that balance these objectives. By identifying such solutions, users can make informed decisions about the trade-off between fairness and data quality and select the most suitable subset for their application.

Heinrich Heine Universität

Datenbanken und Informationssysteme

Lehrstuhlinhaber

Prof. Dr. Stefan Conrad


Universitätsstr. 1
40225 Düsseldorf
Gebäude: 25.12
Etage/Raum: 02.24
Tel.: +49 211 81-14088

Sekretariat

Lisa Lorenz



Universitätsstr. 1
40225 Düsseldorf
Gebäude: 25.12
Etage/Raum: 02.22
Tel.: +49 211 81-11312
Verantwortlich für den Inhalt:  E-Mail senden Datenbanken & Informationssysteme