Multimodal Speech Emotion Recognition And Ambiguity Resolution
2019 Β· Gaurav Sahu
Abstract
Identifying emotion from speech is a non-trivial task pertaining to the ambiguous definition of emotion itself. In this work, we adopt a feature-engineering based approach to tackle the task of speech emotion recognition. Formalizing our problem as a multi-class classification problem, we compare the performance of two categories of models. For both, we extract eight hand-crafted features from the audio signal. In the first approach, the extracted features are used to train six traditional machine learning classifiers, whereas the second approach is based on deep learning wherein a baseline feed-forward neural network and an LSTM-based classifier are trained over the same features. In order to resolve ambiguity in communication, we also include features from the text domain. We report accuracy, f-score, precision, and recall for the different experiment settings we evaluated our models in. Overall, we show that lighter machine learning based models trained over a few hand-crafted featu
Authors
(none)
Tags
Stats
Related papers
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)15.22
- Multimodal Speech Emotion Recognition Using Audio And Text (2018)18.02
- Multimodal Emotion Recognition Using Transfer Learning From Speaker Recognition And Bert-based Models (2022)12.10
- Interpretable Multimodal Emotion Recognition Using Hybrid Fusion Of Speech And Image Data (2022)11.85
- Contrastive Regularization For Multimodal Emotion Recognition Using Audio And Text (2022)0.00
- Multimodal Emotion Recognition And Sentiment Analysis In Multi-party Conversation Contexts (2025)0.00
- Fusion Approaches For Emotion Recognition From Speech Using Acoustic And Text-based Features (2024)12.25
- Semantic Matters: Multimodal Features For Affective Analysis (2025)0.00