United We Stand, Divided We Fall: Handling Weak Complementary Relationships For Audio-visual Emotion Recognition In Valence-arousal Space
2025 Β· R. Gnana Praveen, Jahangir Alam, Eric Charton
Abstract
Audio and visual modalities are two predominant contact-free channels in videos, which are often expected to carry a complementary relationship with each other. However, they may not always complement each other, resulting in poor audio-visual feature representations. In this paper, we introduce Gated Recursive Joint Cross Attention (GRJCA) using a gating mechanism that can adaptively choose the most relevant features to effectively capture the synergic relationships across audio and visual modalities. Specifically, we improve the performance of Recursive Joint Cross-Attention (RJCA) by introducing a gating mechanism to control the flow of information between the input features and the attended features of multiple iterations depending on the strength of their complementary relationship. For instance, if the modalities exhibit strong complementary relationships, the gating mechanism emphasizes cross-attended features, otherwise non-attended features. To further improve the performance
Authors
(none)
Tags
Stats
Related papers
- Recursive Joint Attention For Audio-visual Fusion In Regression Based Emotion Recognition (2023)9.59
- Recursive Joint Cross-modal Attention For Multimodal Fusion In Dimensional Emotion Recognition (2024)11.39
- A Joint Cross-attention Model For Audio-visual Fusion In Dimensional Emotion Recognition (2022)18.00
- TAGF: Time-aware Gated Fusion For Multimodal Valence-arousal Estimation (2025)0.00
- Multimodal Fusion Method With Spatiotemporal Sequences And Relationship Learning For Valence-arousal Estimation (2024)0.00
- Audio-visual Speech Separation Based On Joint Feature Representation With Cross-modal Attention (2022)0.00
- Cross-modal Global Interaction And Local Alignment For Audio-visual Speech Recognition (2023)7.50
- Audio Visual Emotion Recognition With Temporal Alignment And Perception Attention (2016)0.00