A comparative study on word embeddings in social NLP tasks

Fatma Elsafoury, Steven R. Wilson, Naeem Ramzan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)
3 Downloads (Pure)

Abstract

In recent years, gray social media platforms, those with a loose moderation policy on cyberbullying, have been attracting more users. Recently, data collected from these types of platforms have been used to pre-train word embeddings (social-media-based), yet these word embeddings have not been investigated for social NLP related tasks. In this paper, we carried out a comparative study between social-media-based and non-social-media-based word embeddings on two social NLP tasks: Detecting cyberbullying and Measuring social bias. Our results show that using social-media-based word embeddings as input features, rather than non-social-media-based embeddings, leads to better cyberbullying detection performance. We also show that some word embeddings are more useful than others for categorizing offensive words. However, we do not find strong evidence that certain word embeddings will necessarily work best when identifying certain categories of cyberbullying within our datasets. Finally, We show even though most of the state-of-the-art bias metrics ranked social-media-based word embeddings as the most socially biased, these results remain inconclusive and further research is required.

Original languageEnglish
Title of host publicationProceedings of the Tenth International Workshop on Natural Language Processing for Social Media
EditorsLun-Wei Ku, Cheng-Te Li, Yu-Che Tsai, Wei-Yao Wang
PublisherThe Association for Computational Linguistics
Pages55-64
Number of pages10
ISBN (Electronic)9781955917889
DOIs
Publication statusPublished - 14 Jul 2022
Event10th International Workshop on Natural Language Processing for Social Media, SocialNLP 2022 - Seattle, United States
Duration: 14 Jul 202215 Jul 2022

Conference

Conference10th International Workshop on Natural Language Processing for Social Media, SocialNLP 2022
Country/TerritoryUnited States
CitySeattle
Period14/07/2215/07/22

Fingerprint

Dive into the research topics of 'A comparative study on word embeddings in social NLP tasks'. Together they form a unique fingerprint.

Cite this