Abstract
The automatic recognition of emotion from speech recordings uses supervised machine learning techniques which requires labeled
training data in order to operate effectively. The performance of these supervised learning techniques depends on the quality of the
training data and therefore on the quality of the labels or ratings. In this domain the ratings are typically estimated from the
subjective opinion of a small number of experts.
Recently with the availability of crowdsourcing services it has become inexpensive to acquire labels from multiple non-expert
annotators which has led to the use of crowdsourcing for labelling training data in a variety of domains. It can be argued that
emotional expertise does not necessarily correlate with emotional experience suggesting that wider non-expert raters can provide
equally valid ratings in the domain of emotion recognition from speech also.
There are a number of challenges with using crowdsourcing to label speech assets, including how to select which assets are
presented for rating, how to estimate the reliability or bias of the annotators, how to derive the ground truth for the asset and
maintaining the balance between data coverage and data quality. Our work in this area is considering these issues for crowdsourcing
ratings for a high quality emotional speech corpus which has been generated using Mood Induction Procedures. We are developing
an online rating tool which will use active learning techniques to select assets to present for ratings to the raters that have been
identified as the best performing raters up to that point in the process.
training data in order to operate effectively. The performance of these supervised learning techniques depends on the quality of the
training data and therefore on the quality of the labels or ratings. In this domain the ratings are typically estimated from the
subjective opinion of a small number of experts.
Recently with the availability of crowdsourcing services it has become inexpensive to acquire labels from multiple non-expert
annotators which has led to the use of crowdsourcing for labelling training data in a variety of domains. It can be argued that
emotional expertise does not necessarily correlate with emotional experience suggesting that wider non-expert raters can provide
equally valid ratings in the domain of emotion recognition from speech also.
There are a number of challenges with using crowdsourcing to label speech assets, including how to select which assets are
presented for rating, how to estimate the reliability or bias of the annotators, how to derive the ground truth for the asset and
maintaining the balance between data coverage and data quality. Our work in this area is considering these issues for crowdsourcing
ratings for a high quality emotional speech corpus which has been generated using Mood Induction Procedures. We are developing
an online rating tool which will use active learning techniques to select assets to present for ratings to the raters that have been
identified as the best performing raters up to that point in the process.
Original language | English |
---|---|
DOIs | |
Publication status | Published - 2011 |
Externally published | Yes |
Event | International Classification Conference 2011 - University of St Andrews, St Andrews, United Kingdom Duration: 11 Jul 2011 → 15 Jul 2011 |
Conference
Conference | International Classification Conference 2011 |
---|---|
Country/Territory | United Kingdom |
City | St Andrews |
Period | 11/07/11 → 15/07/11 |