Skip to main navigation Skip to search Skip to main content

Darkness can not drive out darkness: investigating bias in hate speech detection models

  • Fatma Elsafoury

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    21 Downloads (Pure)

    Abstract

    It has become crucial to develop tools for automated hate speech and abuse detection. These tools would help to stop the bullies and the haters and provide a safer environment for individuals especially from marginalized groups to freely express themselves. However, recent research shows that machine learning models are biased and they might make the right decisions for the wrong reasons. In this thesis, I set out to understand the performance of hate speech and abuse detection models and the different biases that could influence them. I show that hate speech and abuse detection models are not only subject to social bias but also to other types of bias that have not been explored before. Finally, I investigate the causal effect of the social and intersectional bias on the performance and unfairness of hate speech detection models.
    Original languageEnglish
    Title of host publicationThe 60th Annual Meeting of the Association for Computational Linguistics
    Subtitle of host publicationProceedings of the Student Research Workshop, May 22-27, 2022
    EditorsSamuel Louvan, Andrea Madotto, Brielen Madureira
    PublisherThe Association for Computational Linguistics
    Pages31-43
    Number of pages13
    ISBN (Print)9781955917230
    Publication statusPublished - 22 May 2022

    Fingerprint

    Dive into the research topics of 'Darkness can not drive out darkness: investigating bias in hate speech detection models'. Together they form a unique fingerprint.

    Cite this