Semantic Matters: Multimodal Features For Affective Analysis
2025 Β· Tobias Hallmen, Robin-Nico Kampa, Fabian Deuser, et al.
Abstract
In this study, we present our methodology for two tasks: the Emotional Mimicry Intensity (EMI) Estimation Challenge and the Behavioural Ambivalence/Hesitancy (BAH) Recognition Challenge, both conducted as part of the 8th Workshop and Competition on Affective & Behavior Analysis in-the-wild. We utilize a Wav2Vec 2.0 model pre-trained on a large podcast dataset to extract various audio features, capturing both linguistic and paralinguistic information. Our approach incorporates a valence-arousal-dominance (VAD) module derived from Wav2Vec 2.0, a BERT text encoder, and a vision transformer (ViT) with predictions subsequently processed through a long short-term memory (LSTM) architecture or a convolution-like method for temporal modeling. We integrate the textual and visual modality into our analysis, recognizing that semantic content provides valuable contextual cues and underscoring that the meaning of speech often conveys more critical insights than its acoustic counterpart alone. Fusin
Authors
(none)
Tags
Stats
Related papers
- Audio-guided Fusion Techniques For Multimodal Emotion Analysis (2024)4.52
- MSF-SER: Enriching Acoustic Modeling With Multi-granularity Semantics For Speech Emotion Recognition (2025)0.00
- Continuous Multimodal Emotion Recognition Approach For AVEC 2017 (2017)0.00
- Advancing Audio Emotion And Intent Recognition With Large Pre-trained Models And Bayesian Inference (2023)5.24
- Multimodal Speech Emotion Recognition And Ambiguity Resolution (2019)0.00
- Speech Emotion: Investigating Model Representations, Multi-task Learning And Knowledge Distillation (2022)6.34
- Getting The Subtext Without The Text: Scalable Multimodal Sentiment Classification From Visual And Acoustic Modalities (2018)7.50
- Multistage Linguistic Conditioning Of Convolutional Layers For Speech Emotion Recognition (2021)9.23