Modeling Speech Emotion With Label Variance And Analyzing Performance Across Speakers And Unseen Acoustic Conditions
2025 Β· Vikramjit Mitra, Amrit Romana, Dung T. Tran, et al.
Abstract
Spontaneous speech emotion data usually contain perceptual grades where graders assign emotion score after listening to the speech files. Such perceptual grades introduce uncertainty in labels due to grader opinion variation. Grader variation is addressed by using consensus grades as groundtruth, where the emotion with the highest vote is selected. Consensus grades fail to consider ambiguous instances where a speech sample may contain multiple emotions, as captured through grader opinion uncertainty. We demonstrate that using the probability density function of the emotion grades as targets instead of the commonly used consensus grades, provide better performance on benchmark evaluation sets compared to results reported in the literature. We show that a saliency driven foundation model (FM) representation selection helps to train a state-of-the-art speech emotion model for both dimensional and categorical emotion recognition. Comparing representations obtained from different FMs, we ob
Authors
(none)
Tags
Stats
Related papers
- Investigating Salient Representations And Label Variance In Dimensional Speech Emotion Analysis (2023)3.58
- Speech Emotion: Investigating Model Representations, Multi-task Learning And Knowledge Distillation (2022)6.34
- The Whole Is Bigger Than The Sum Of Its Parts: Modeling Individual Annotators To Capture Emotional Variability (2024)3.58
- Unifying The Discrete And Continuous Emotion Labels For Speech Emotion Recognition (2022)0.00
- Multimodal Speech Emotion Recognition And Ambiguity Resolution (2019)0.00
- Fine-grained Emotion Strength Transfer, Control And Prediction For Emotional Speech Synthesis (2020)12.25
- Pre-trained Model Representations And Their Robustness Against Noise For Speech Emotion Analysis (2023)0.00
- Learning Arousal-valence Representation From Categorical Emotion Labels Of Speech (2023)7.50