SUN Team's Contribution To ABAW 2024 Competition: Audio-visual Valence-arousal Estimation And Expression Recognition
2024 Β· Denis Dresvyanskiy, Maxim Markitantov, Jiawei Yu, et al.
Abstract
As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention in the last two decades. While multimodal systems enjoy high performances on lab-controlled data, they are still far from providing ecological validity on non-lab-controlled, namely 'in-the-wild' data. This work investigates audiovisual deep learning approaches for emotion recognition in-the-wild problem. We particularly explore the effectiveness of architectures based on fine-tuned Convolutional Neural Networks (CNN) and Public Dimensional Emotion Model (PDEM), for video and audio modality, respectively. We compare alternative temporal modeling and fusion strategies using the embeddings from these multi-stage trained modality-specific Deep Neural Networks (DNN). We report results on the AffWild2 dataset under Affective Behavior Analysis in-the-Wild 2024 (ABAW'24) challenge protocol.
Authors
(none)
Tags
Stats
Related papers
- Mutilmodal Feature Extraction And Attention-based Fusion For Emotion Estimation In Videos (2023)1.40
- Multi-modal Continuous Valence And Arousal Prediction In The Wild Using Deep 3D Features And Sequence Modeling (2020)0.00
- Multimodal Fusion Method With Spatiotemporal Sequences And Relationship Learning For Valence-arousal Estimation (2024)0.00
- Team LEYA In 10th ABAW Competition: Multimodal Ambivalence/hesitancy Recognition Approach (2026)0.00
- Continuous Multimodal Emotion Recognition Approach For AVEC 2017 (2017)0.00
- Self-relation Attention And Temporal Awareness For Emotion Recognition Via Vocal Burst (2022)4.18
- Audio-visual Compound Expression Recognition Method Based On Late Modality Fusion And Rule-based Decision (2024)0.00
- MAVEN: Multi-modal Attention For Valence-arousal Emotion Network (2025)6.92