Two-stage Dimensional Emotion Recognition By Fusing Predictions Of Acoustic And Text Networks Using SVM
2022 Β· Bagus Tris Atmaja, Masato Akagi
Abstract
Automatic speech emotion recognition (SER) by a computer is a critical component for more natural human-machine interaction. As in human-human interaction, the capability to perceive emotion correctly is essential to take further steps in a particular situation. One issue in SER is whether it is necessary to combine acoustic features with other data such as facial expressions, text, and motion capture. This research proposes to combine acoustic and text information by applying a late-fusion approach consisting of two steps. First, acoustic and text features are trained separately in deep learning systems. Second, the prediction results from the deep learning systems are fed into a support vector machine (SVM) to predict the final regression score. Furthermore, the task in this research is dimensional emotion modeling because it can enable a deeper analysis of affective states. Experimental results show that this two-stage, late-fusion approach, obtains higher performance than that of a
Authors
(none)
Tags
Stats
Related papers
- Multistage Linguistic Conditioning Of Convolutional Layers For Speech Emotion Recognition (2021)9.23
- Speech Emotion Recognition With Dual-sequence LSTM Architecture (2019)15.78
- MSF-SER: Enriching Acoustic Modeling With Multi-granularity Semantics For Speech Emotion Recognition (2025)0.00
- Sigwavnet: Learning Multiresolution Signal Wavelet Network For Speech Emotion Recognition (2025)8.48
- Improved Speech Emotion Recognition Using Transfer Learning And Spectrogram Augmentation (2021)12.74
- Speech Emotion Recognition Using Deep Sparse Auto-encoder Extreme Learning Machine With A New Weighting Scheme And Spectro-temporal Features Along With Classical Feature Selection And A New Quantum-inspired Dimension Reduction Method (2021)0.00
- MF-AED-AEC: Speech Emotion Recognition By Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction (2024)0.00
- Fusing ASR Outputs In Joint Training For Speech Emotion Recognition (2021)12.61