MSAC: Multiple Speech Attribute Control Method For Reliable Speech Emotion Recognition
2023 Β· Yu Pan, Yuguang Yang, Yuheng Huang, et al.
Abstract
Despite notable progress, speech emotion recognition (SER) remains challenging due to the intricate and ambiguous nature of speech emotion, particularly in wild world. While current studies primarily focus on recognition and generalization abilities, our research pioneers an investigation into the reliability of SER methods in the presence of semantic data shifts and explores how to exert fine-grained control over various attributes inherent in speech signals to enhance speech emotion modeling. In this paper, we first introduce MSAC-SERNet, a novel unified SER framework capable of simultaneously handling both single-corpus and cross-corpus SER. Specifically, concentrating exclusively on the speech emotion attribute, a novel CNN-based SER model is presented to extract discriminative emotional representations, guided by additive margin softmax loss. Considering information overlap between various speech attributes, we propose a novel learning paradigm based on correlations of different s
Authors
(none)
Tags
Stats
Related papers
- MSF-SER: Enriching Acoustic Modeling With Multi-granularity Semantics For Speech Emotion Recognition (2025)0.00
- Mouth Articulation-based Anchoring For Improved Cross-corpus Speech Emotion Recognition (2024)2.26
- Ctl-mtnet: A Novel Capsnet And Transfer Learning-based Mixed Task Net For The Single-corpus And Cross-corpus Speech Emotion Recognition (2022)10.21
- MF-AED-AEC: Speech Emotion Recognition By Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction (2024)0.00
- Msp-podcast SER Challenge 2024: L'antenne Du Ventoux Multimodal Self-supervised Learning For Speech Emotion Recognition (2024)5.84
- Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search (2022)2.26
- Leveraging Content And Acoustic Representations For Speech Emotion Recognition (2024)2.26
- Dsnet: Disentangled Siamese Network With Neutral Calibration For Speech Emotion Recognition (2023)0.00