End-to-end Multimodal Emotion And Gender Recognition With Dynamic Joint Loss Weights
2018 Β· Myungsu Chae, Tae-Ho Kim, Young Hoon Shin, et al.
Abstract
Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient considerations by setting them uniformly or empirically. In this study, we propose a method to calculate joint loss using dynamic weights to improve the total performance, instead of the individual performance, of tasks. We apply this method to design an end-to-end multimodal emotion and gender recognition model using audio and video data. This approach provides proper weights for the loss of each task when the training process ends. In our experiments, emotion and gender recognition with the proposed method yielded a lower joint loss, which is computed as the negative log-likelihood, than using s
Authors
(none)
Tags
Stats
Related papers
- Dynamic Restrained Uncertainty Weighting Loss For Multitask Learning Of Vocal Expression (2022)0.00
- Attention-augmented End-to-end Multi-task Learning For Emotion Prediction From Speech (2019)13.50
- A Joint Cross-attention Model For Audio-visual Fusion In Dimensional Emotion Recognition (2022)18.00
- Recursive Joint Cross-modal Attention For Multimodal Fusion In Dimensional Emotion Recognition (2024)11.39
- Contrastive Regularization For Multimodal Emotion Recognition Using Audio And Text (2022)0.00
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00
- MMER: Multimodal Multi-task Learning For Speech Emotion Recognition (2022)10.07
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)15.22