Using crowdsourcing in the rating of emotional speech assets

Sarah-Jane Delany, Alexey Tarasov, John Snel, Charlie Cullen

Research output: Contribution to conferencePaper


The automatic recognition of emotion from speech recordings uses supervised machine learning techniques which requires labeled
training data in order to operate effectively. The performance of these supervised learning techniques depends on the quality of the
training data and therefore on the quality of the labels or ratings. In this domain the ratings are typically estimated from the
subjective opinion of a small number of experts.

Recently with the availability of crowdsourcing services it has become inexpensive to acquire labels from multiple non-expert
annotators which has led to the use of crowdsourcing for labelling training data in a variety of domains. It can be argued that
emotional expertise does not necessarily correlate with emotional experience suggesting that wider non-expert raters can provide
equally valid ratings in the domain of emotion recognition from speech also.

There are a number of challenges with using crowdsourcing to label speech assets, including how to select which assets are
presented for rating, how to estimate the reliability or bias of the annotators, how to derive the ground truth for the asset and
maintaining the balance between data coverage and data quality. Our work in this area is considering these issues for crowdsourcing
ratings for a high quality emotional speech corpus which has been generated using Mood Induction Procedures. We are developing
an online rating tool which will use active learning techniques to select assets to present for ratings to the raters that have been
identified as the best performing raters up to that point in the process.
Original languageEnglish
Publication statusPublished - 2011
Externally publishedYes
EventInternational Classification Conference 2011 - University of St Andrews, St Andrews, United Kingdom
Duration: 11 Jul 201115 Jul 2011


ConferenceInternational Classification Conference 2011
Country/TerritoryUnited Kingdom
CitySt Andrews


Dive into the research topics of 'Using crowdsourcing in the rating of emotional speech assets'. Together they form a unique fingerprint.

Cite this