Gemo-clap: Gender-attribute-enhanced Contrastive Language-audio Pretraining For Accurate Speech Emotion Recognition
2023 Β· Yu Pan, Yanni Hu, Yuguang Yang, et al.
Abstract
Contrastive cross-modality pretraining has recently exhibited impressive success in diverse fields, whereas there is limited research on their merits in speech emotion recognition (SER). In this paper, we propose GEmo-CLAP, a kind of gender-attribute-enhanced contrastive language-audio pretraining (CLAP) method for SER. Specifically, we first construct an effective emotion CLAP (Emo-CLAP) for SER, using pre-trained text and audio encoders. Second, given the significance of gender information in SER, two novel multi-task learning based GEmo-CLAP (ML-GEmo-CLAP) and soft label based GEmo-CLAP (SL-GEmo-CLAP) models are further proposed to incorporate gender information of speech signals, forming more reasonable objectives. Experiments on IEMOCAP indicate that our proposed two GEmo-CLAPs consistently outperform Emo-CLAP with different pre-trained models. Remarkably, the proposed WavLM-based SL-GEmo-CLAP obtains the best WAR of 83.16%, which performs better than state-of-the-art SER methods.
Authors
(none)
Tags
Stats
Related papers
- GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning For Speech Emotion Recognition (2024)0.00
- Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition (2025)4.52
- A Cross-corpus Speech Emotion Recognition Method Based On Supervised Contrastive Learning (2024)0.00
- Supervised Contrastive Learning With Nearest Neighbor Search For Speech Emotion Recognition (2023)7.16
- Human-clap: Human-perception-based Contrastive Language-audio Pretraining (2025)4.52
- Towards Speech Emotion Recognition "in The Wild" Using Aggregated Corpora And Deep Multi-task Learning (2017)12.87
- Clapspeech: Learning Prosody From Text Context With Contrastive Language-audio Pre-training (2023)0.00
- Mouth Articulation-based Anchoring For Improved Cross-corpus Speech Emotion Recognition (2024)2.26