Mouth Articulation-based Anchoring For Improved Cross-corpus Speech Emotion Recognition
2024 Β· Shreya G. Upadhyay, Ali N. Salman, Carlos Busso, et al.
Abstract
Cross-corpus speech emotion recognition (SER) plays a vital role in numerous practical applications. Traditional approaches to cross-corpus emotion transfer often concentrate on adapting acoustic features to align with different corpora, domains, or labels. However, acoustic features are inherently variable and error-prone due to factors like speaker differences, domain shifts, and recording conditions. To address these challenges, this study adopts a novel contrastive approach by focusing on emotion-specific articulatory gestures as the core elements for analysis. By shifting the emphasis on the more stable and consistent articulatory gestures, we aim to enhance emotion transfer learning in SER tasks. Our research leverages the CREMA-D and MSP-IMPROV corpora as benchmarks and it reveals valuable insights into the commonality and reliability of these articulatory gestures. The findings highlight mouth articulatory gesture potential as a better constraint for improving emotion recogniti
Authors
(none)
Tags
Stats
Related papers
- A Layer-anchoring Strategy For Enhancing Cross-lingual Speech Emotion Recognition (2024)0.00
- MSAC: Multiple Speech Attribute Control Method For Reliable Speech Emotion Recognition (2023)0.00
- ASR And Emotional Speech: A Word-level Investigation Of The Mutual Impact Of Speech And Emotion Recognition (2023)8.82
- Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition (2025)4.52
- Emonet: A Transfer Learning Framework For Multi-corpus Speech Emotion Recognition (2021)2.95
- CTA-RNN: Channel And Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings For Speech Emotion Recognition (2022)5.84
- MF-AED-AEC: Speech Emotion Recognition By Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction (2024)0.00
- Leveraging Content And Acoustic Representations For Speech Emotion Recognition (2024)2.26