Speech Emotion Recognition Using Multi-hop Attention Mechanism
2019 Β· Seunghyun Yoon, Seokhyun Byun, Subhadeep Dey, et al.
Abstract
In this paper, we are interested in exploiting textual and acoustic data of an utterance for the speech emotion classification task. The baseline approach models the information from audio and text independently using two deep neural networks (DNNs). The outputs from both the DNNs are then fused for classification. As opposed to using knowledge from both the modalities separately, we propose a framework to exploit acoustic information in tandem with lexical data. The proposed framework uses two bi-directional long short-term memory (BLSTM) for obtaining hidden representations of the utterance. Furthermore, we propose an attention mechanism, referred to as the multi-hop, which is trained to automatically infer the correlation between the modalities. The multi-hop attention first computes the relevant segments of the textual data corresponding to the audio signal. The relevant textual data is then applied to attend parts of the audio signal. To evaluate the performance of the proposed sy
Authors
(none)
Tags
Stats
Related papers
- Attentive Modality Hopping Mechanism For Speech Emotion Recognition (2019)0.00
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)15.22
- Multimodal Speech Emotion Recognition Using Audio And Text (2018)18.02
- Conversational Emotion Analysis Via Attention Mechanisms (2019)10.35
- Emotech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information With Hybrid Recurrent Network (2025)8.35
- Multimodal Speech Emotion Recognition Using Cross Attention With Aligned Audio And Text (2022)9.76
- Speech Emotion Recognition With Co-attention Based Multi-level Acoustic Information (2022)16.17
- Attention-augmented End-to-end Multi-task Learning For Emotion Prediction From Speech (2019)13.50