The Whole Is Bigger Than The Sum Of Its Parts: Modeling Individual Annotators To Capture Emotional Variability
2024 Β· James Tavernor, Yara El-Tawil, Emily Mower Provost
Abstract
Emotion expression and perception are nuanced, complex, and highly subjective processes. When multiple annotators label emotional data, the resulting labels contain high variability. Most speech emotion recognition tasks address this by averaging annotator labels as ground truth. However, this process omits the nuance of emotion and inter-annotator variability, which are important signals to capture. Previous work has attempted to learn distributions to capture emotion variability, but these methods also lose information about the individual annotators. We address these limitations by learning to predict individual annotators and by introducing a novel method to create distributions from continuous model outputs that permit the learning of emotion distributions during model training. We show that this combined approach can result in emotion distributions that are more accurate than those seen in prior work, in both within- and cross-corpus settings.
Authors
(none)
Tags
Stats
Related papers
- Modeling Speech Emotion With Label Variance And Analyzing Performance Across Speakers And Unseen Acoustic Conditions (2025)0.00
- Dynamic Time-alignment Of Dimensional Annotations Of Emotion Using Recurrent Neural Networks (2022)0.00
- Unifying The Discrete And Continuous Emotion Labels For Speech Emotion Recognition (2022)0.00
- Improving Speech Emotion Recognition With Mutual Information Regularized Generative Model (2025)0.00
- Personalized Adaptation With Pre-trained Speech Encoders For Continuous Emotion Recognition (2023)6.34
- Speech Emotion: Investigating Model Representations, Multi-task Learning And Knowledge Distillation (2022)6.34
- Modeling Feature Representations For Affective Speech Using Generative Adversarial Networks (2019)0.00
- Learning Representations Of Emotional Speech With Deep Convolutional Generative Adversarial Networks (2017)0.00