Group Gated Fusion On Attention-based Bidirectional Alignment For Multimodal Emotion Recognition
2022 Β· Pengfei Liu, Kun Li, Helen Meng
Abstract
Emotion recognition is a challenging and actively-studied research area that plays a critical role in emotion-aware human-computer interaction systems. In a multimodal setting, temporal alignment between different modalities has not been well investigated yet. This paper presents a new model named as Gated Bidirectional Alignment Network (GBAN), which consists of an attention-based bidirectional alignment network over LSTM hidden states to explicitly capture the alignment relationship between speech and text, and a novel group gated fusion (GGF) layer to integrate the representations of different modalities. We empirically show that the attention-aligned representations outperform the last-hidden-states of LSTM significantly, and the proposed GBAN model outperforms existing state-of-the-art multimodal approaches on the IEMOCAP dataset.
Authors
(none)
Tags
Stats
Related papers
- TAGF: Time-aware Gated Fusion For Multimodal Valence-arousal Estimation (2025)0.00
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)15.22
- Enhancing Modal Fusion By Alignment And Label Matching For Multimodal Emotion Recognition (2024)6.34
- Conversational Emotion Analysis Via Attention Mechanisms (2019)10.35
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Information Fusion In Attention Networks Using Adaptive And Multi-level Factorized Bilinear Pooling For Audio-visual Emotion Recognition (2021)13.97
- Emotion Recognition By Fusing Time Synchronous And Time Asynchronous Representations (2020)13.05
- Audio-guided Fusion Techniques For Multimodal Emotion Analysis (2024)4.52