Unsupervised Representations Improve Supervised Learning In Speech Emotion Recognition
2023 Β· Amirali Soltani Tehrani, Niloufar Faridani, Ramin Toosi
Abstract
Speech Emotion Recognition (SER) plays a pivotal role in enhancing human-computer interaction by enabling a deeper understanding of emotional states across a wide range of applications, contributing to more empathetic and effective communication. This study proposes an innovative approach that integrates self-supervised feature extraction with supervised classification for emotion recognition from small audio segments. In the preprocessing step, to eliminate the need of crafting audio features, we employed a self-supervised feature extractor, based on the Wav2Vec model, to capture acoustic features from audio data. Then, the output featuremaps of the preprocessing step are fed to a custom designed Convolutional Neural Network (CNN)-based model to perform emotion classification. Utilizing the ShEMO dataset as our testing ground, the proposed method surpasses two baseline methods, i.e. support vector machine classifier and transfer learning of a pretrained CNN. comparing the propose meth
Authors
(none)
Tags
Stats
Related papers
- Speaker Emotion Recognition: Leveraging Self-supervised Models For Feature Extraction Using Wav2vec2 And Hubert (2024)0.00
- Supervised Contrastive Learning With Nearest Neighbor Search For Speech Emotion Recognition (2023)7.16
- On The Use Of Self-supervised Pre-trained Acoustic And Linguistic Features For Continuous Speech Emotion Recognition (2020)11.85
- End-to-end Integration Of Speech Emotion Recognition With Voice Activity Detection Using Self-supervised Learning Features (2024)0.00
- Sigwavnet: Learning Multiresolution Signal Wavelet Network For Speech Emotion Recognition (2025)8.48
- Leveraging Content And Acoustic Representations For Speech Emotion Recognition (2024)2.26
- A Cross-corpus Speech Emotion Recognition Method Based On Supervised Contrastive Learning (2024)0.00
- Exploring Self-supervised Multi-view Contrastive Learning For Speech Emotion Recognition With Limited Annotations (2024)3.58