Speaker Emotion Recognition: Leveraging Self-supervised Models For Feature Extraction Using Wav2vec2 And Hubert
2024 Β· Pourya Jafarzadeh, Amir Mohammad Rostami, Padideh Choobdar
Abstract
Speech is the most natural way of expressing ourselves as humans. Identifying emotion from speech is a nontrivial task due to the ambiguous definition of emotion itself. Speaker Emotion Recognition (SER) is essential for understanding human emotional behavior. The SER task is challenging due to the variety of speakers, background noise, complexity of emotions, and speaking styles. It has many applications in education, healthcare, customer service, and Human-Computer Interaction (HCI). Previously, conventional machine learning methods such as SVM, HMM, and KNN have been used for the SER task. In recent years, deep learning methods have become popular, with convolutional neural networks and recurrent neural networks being used for SER tasks. The input of these methods is mostly spectrograms and hand-crafted features. In this work, we study the use of self-supervised transformer-based models, Wav2Vec2 and HuBERT, to determine the emotion of speakers from their voice. The models automatic
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Representations Improve Supervised Learning In Speech Emotion Recognition (2023)0.00
- On The Use Of Self-supervised Pre-trained Acoustic And Linguistic Features For Continuous Speech Emotion Recognition (2020)11.85
- Dawn Of The Transformer Era In Speech Emotion Recognition: Closing The Valence Gap (2022)18.59
- Sigwavnet: Learning Multiresolution Signal Wavelet Network For Speech Emotion Recognition (2025)8.48
- Multimodal Emotion Recognition Using Transfer Learning From Speaker Recognition And Bert-based Models (2022)12.10
- Improved Speech Emotion Recognition Using Transfer Learning And Spectrogram Augmentation (2021)12.74
- Probing Speech Emotion Recognition Transformers For Linguistic Knowledge (2022)9.59
- Decoding Emotions: A Comprehensive Multilingual Study Of Speech Models For Speech Emotion Recognition (2023)0.00