Learning Arousal-valence Representation From Categorical Emotion Labels Of Speech
2023 Β· Enting Zhou, You Zhang, Zhiyao Duan
Abstract
Dimensional representations of speech emotions such as the arousal-valence (AV) representation provide a continuous and fine-grained description and control than their categorical counterparts. They have wide applications in tasks such as dynamic emotion understanding and expressive text-to-speech synthesis. Existing methods that predict the dimensional emotion representation from speech cast it as a supervised regression task. These methods face data scarcity issues, as dimensional annotations are much harder to acquire than categorical labels. In this work, we propose to learn the AV representation from categorical emotion labels of speech. We start by learning a rich and emotion-relevant high-dimensional speech feature representation using self-supervised pre-training and emotion classification fine-tuning. This representation is then mapped to the 2D AV space according to psychological findings through anchored dimensionality reduction. Experiments show that our method achieves a C
Authors
(none)
Tags
Stats
Related papers
- Investigating Salient Representations And Label Variance In Dimensional Speech Emotion Analysis (2023)3.58
- Learning Representations Of Emotional Speech With Deep Convolutional Generative Adversarial Networks (2017)0.00
- Speech Emotion: Investigating Model Representations, Multi-task Learning And Knowledge Distillation (2022)6.34
- Modeling Speech Emotion With Label Variance And Analyzing Performance Across Speakers And Unseen Acoustic Conditions (2025)0.00
- Advancing Multiple Instance Learning With Attention Modeling For Categorical Speech Emotion Recognition (2020)7.50
- Emotional Dimension Control In Language Model-based Text-to-speech: Spanning A Broad Spectrum Of Human Emotions (2024)0.00
- Pre-trained Model Representations And Their Robustness Against Noise For Speech Emotion Analysis (2023)0.00
- Representation Learning Through Cross-modal Conditional Teacher-student Training For Speech Emotion Recognition (2021)11.19