Multimodal Emotion Recognition Using Transfer Learning From Speaker Recognition And Bert-based Models
2022 Β· Sarala Padi, Seyed Omid Sadjadi, Dinesh Manocha, et al.
Abstract
Automatic emotion recognition plays a key role in computer-human interaction as it has the potential to enrich the next-generation artificial intelligence with emotional intelligence. It finds applications in customer and/or representative behavior analysis in call centers, gaming, personal assistants, and social robots, to mention a few. Therefore, there has been an increasing demand to develop robust automatic methods to analyze and recognize the various emotions. In this paper, we propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities. More specifically, we i) adapt a residual network (ResNet) based model trained on a large-scale speaker recognition task using transfer learning along with a spectrogram augmentation approach to recognize emotions from speech, and ii) use a fine-tuned bidirectional encoder representations from transformers (BERT) based model to represent and recogni
Authors
(none)
Tags
Stats
Related papers
- Multi-modal Emotion Detection With Transfer Learning (2020)0.00
- Multi-modal Emotion Recognition By Text, Speech And Video Using Pretrained Transformers (2024)0.00
- Multimodal Speech Emotion Recognition And Ambiguity Resolution (2019)0.00
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)15.22
- Jointly Fine-tuning "bert-like" Self Supervised Models To Improve Multimodal Speech Emotion Recognition (2020)13.74
- A Transfer Learning Method For Speech Emotion Recognition From Automatic Speech Recognition (2020)0.00
- Emotech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information With Hybrid Recurrent Network (2025)8.35
- Multimodal Emotion Recognition And Sentiment Analysis In Multi-party Conversation Contexts (2025)0.00