Darkness can not drive out darkness: investigating bias in hate speech detection models

Fatma Elsafoury

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)
3 Downloads (Pure)

Abstract

It has become crucial to develop tools for automated hate speech and abuse detection. These tools would help to stop the bullies and the haters and provide a safer environment for individuals especially from marginalized groups to freely express themselves. However, recent research shows that machine learning models are biased and they might make the right decisions for the wrong reasons. In this thesis, I set out to understand the performance of hate speech and abuse detection models and the different biases that could influence them. I show that hate speech and abuse detection models are not only subject to social bias but also to other types of bias that have not been explored before. Finally, I investigate the causal effect of the social and intersectional bias on the performance and unfairness of hate speech detection models.
Original languageEnglish
Title of host publicationThe 60th Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationProceedings of the Student Research Workshop, May 22-27, 2022
EditorsSamuel Louvan, Andrea Madotto, Brielen Madureira
PublisherThe Association for Computational Linguistics
Pages31-43
Number of pages13
ISBN (Print)9781955917230
Publication statusPublished - 22 May 2022

Fingerprint

Dive into the research topics of 'Darkness can not drive out darkness: investigating bias in hate speech detection models'. Together they form a unique fingerprint.

Cite this