Speech Emotion: Investigating Model Representations, Multi-task Learning And Knowledge Distillation
2022 Β· Vikramjit Mitra, Hsiang-Yun Sherry Chien, Vasudha Kowtha, et al.
Abstract
Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance. Lexical information can be obtained from pre-trained acoustic models, where the learned representations can improve valence estimation from speech. We investigate the use of pre-trained model representations to improve valence estimation from acoustic speech signal. We also explore fusion of representations to improve emotion estimation across all three emotion dimensions: activation, valence and dominance. Additionally, we investigate if representations from pre-trained models can be distilled into models trained with low-level features, resulting in models with a less number of parameters. We show t
Authors
(none)
Tags
Stats
Related papers
- Pre-trained Model Representations And Their Robustness Against Noise For Speech Emotion Analysis (2023)0.00
- Investigating Salient Representations And Label Variance In Dimensional Speech Emotion Analysis (2023)3.58
- Speech Emotion Recognition With Distilled Prosodic And Linguistic Affect Representations (2023)5.24
- Modeling Speech Emotion With Label Variance And Analyzing Performance Across Speakers And Unseen Acoustic Conditions (2025)0.00
- An Analysis Of Large Speech Models-based Representations For Speech Emotion Recognition (2023)4.52
- Multimodal Speech Emotion Recognition And Ambiguity Resolution (2019)0.00
- Multistage Linguistic Conditioning Of Convolutional Layers For Speech Emotion Recognition (2021)9.23
- Semantic Matters: Multimodal Features For Affective Analysis (2025)0.00