GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning For Speech Emotion Recognition
2024 Β· Yu Pan, Yuguang Yang, Heng Lu, et al.
Abstract
The continuous evolution of pre-trained speech models has greatly advanced Speech Emotion Recognition (SER). However, current research typically relies on utterance-level emotion labels, inadequately capturing the complexity of emotions within a single utterance. In this paper, we introduce GMP-TL, a novel SER framework that employs gender-augmented multi-scale pseudo-label (GMP) based transfer learning to mitigate this gap. Specifically, GMP-TL initially uses the pre-trained HuBERT, implementing multi-task learning and multi-scale k-means clustering to acquire frame-level GMPs. Subsequently, to fully leverage frame-level GMPs and utterance-level emotion labels, a two-stage model fine-tuning approach is presented to further optimize GMP-TL. Experiments on IEMOCAP show that our GMP-TL attains a WAR of 80.0% and an UAR of 82.0%, achieving superior performance compared to state-of-the-art unimodal SER methods while also yielding comparable results to multimodal SER approaches.
Authors
(none)
Tags
Stats
Related papers
- Gemo-clap: Gender-attribute-enhanced Contrastive Language-audio Pretraining For Accurate Speech Emotion Recognition (2023)0.00
- Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition (2025)4.52
- Towards Speech Emotion Recognition "in The Wild" Using Aggregated Corpora And Deep Multi-task Learning (2017)12.87
- Continuous Metric Learning For Transferable Speech Emotion Recognition And Embedding Across Low-resource Languages (2022)0.00
- Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search (2022)2.26
- Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition (2023)10.97
- Generative Data Augmentation Guided By Triplet Loss For Speech Emotion Recognition (2022)3.58
- Improved Speech Emotion Recognition Using Transfer Learning And Spectrogram Augmentation (2021)12.74