Team LEYA In 10th ABAW Competition: Multimodal Ambivalence/hesitancy Recognition Approach
2026 Β· Elena Ryumina, Alexandr Axyonov, Dmitry Sysoev, et al.
Abstract
Ambivalence/hesitancy recognition in unconstrained videos is a challenging problem due to the subtle, multimodal, and context-dependent nature of this behavioral state. In this paper, a multimodal approach for video-level ambivalence/hesitancy recognition is presented for the 10th ABAW Competition. The proposed approach integrates four complementary modalities: scene, face, audio, and text. Scene dynamics are captured with a VideoMAE-based model, facial information is encoded through emotional frame-level embeddings aggregated by statistical pooling, acoustic representations are extracted with EmotionWav2Vec2.0 and processed by a Mamba-based temporal encoder, and linguistic cues are modeled using fine-tuned transformer-based text models. The resulting unimodal embeddings are further combined using multimodal fusion models, including prototype-augmented variants. Experiments on the BAH corpus demonstrate clear gains of multimodal fusion over all unimodal baselines. The best unimodal con
Authors
(none)
Tags
Stats
Related papers
- SUN Team's Contribution To ABAW 2024 Competition: Audio-visual Valence-arousal Estimation And Expression Recognition (2024)0.00
- Semantic Matters: Multimodal Features For Affective Analysis (2025)0.00
- Mutilmodal Feature Extraction And Attention-based Fusion For Emotion Estimation In Videos (2023)1.40
- Multimodal Fusion Method With Spatiotemporal Sequences And Relationship Learning For Valence-arousal Estimation (2024)0.00
- Continuous Multimodal Emotion Recognition Approach For AVEC 2017 (2017)0.00
- Multi-modal Continuous Valence And Arousal Prediction In The Wild Using Deep 3D Features And Sequence Modeling (2020)0.00
- Framewise Approach In Multimodal Emotion Recognition In OMG Challenge (2018)0.00
- Agent-based Modular Learning For Multimodal Emotion Recognition In Human-agent Systems (2025)0.00