Supervised Contrastive Learning With Nearest Neighbor Search For Speech Emotion Recognition
2023 Β· Xuechen Wang, Shiwan Zhao, Yong Qin
Abstract
Speech Emotion Recognition (SER) is a challenging task due to limited data and blurred boundaries of certain emotions. In this paper, we present a comprehensive approach to improve the SER performance throughout the model lifecycle, including pre-training, fine-tuning, and inference stages. To address the data scarcity issue, we utilize a pre-trained model, wav2vec2.0. During fine-tuning, we propose a novel loss function that combines cross-entropy loss with supervised contrastive learning loss to improve the model's discriminative ability. This approach increases the inter-class distances and decreases the intra-class distances, mitigating the issue of blurred boundaries. Finally, to leverage the improved distances, we propose an interpolation method at the inference stage that combines the model prediction with the output from a k-nearest neighbors model. Our experiments on IEMOCAP demonstrate that our proposed methods outperform current state-of-the-art results.
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Representations Improve Supervised Learning In Speech Emotion Recognition (2023)0.00
- A Cross-corpus Speech Emotion Recognition Method Based On Supervised Contrastive Learning (2024)0.00
- Speech Emotion Recognition Via Contrastive Loss Under Siamese Networks (2019)12.17
- Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation (2021)13.65
- Exploring Self-supervised Multi-view Contrastive Learning For Speech Emotion Recognition With Limited Annotations (2024)3.58
- Towards Adversarial Learning Of Speaker-invariant Representation For Speech Emotion Recognition (2019)0.00
- Exploring Wav2vec 2.0 Fine-tuning For Improved Speech Emotion Recognition (2021)15.67
- Sigwavnet: Learning Multiresolution Signal Wavelet Network For Speech Emotion Recognition (2025)8.48