Towards Speech Emotion Recognition "in The Wild" Using Aggregated Corpora And Deep Multi-task Learning
2017 Β· Jaebok Kim, Gwenn Englebienne, Khiet P. Truong, et al.
Abstract
One of the challenges in Speech Emotion Recognition (SER) "in the wild" is the large mismatch between training and test data (e.g. speakers and tasks). In order to improve the generalisation capabilities of the emotion models, we propose to use Multi-Task Learning (MTL) and use gender and naturalness as auxiliary tasks in deep neural networks. This method was evaluated in within-corpus and various cross-corpus classification experiments that simulate conditions "in the wild". In comparison to Single-Task Learning (STL) based state of the art methods, we found that our MTL method proposed improved performance significantly. Particularly, models using both gender and naturalness achieved more gains than those using either gender or naturalness separately. This benefit was also found in the high-level representations of the feature space, obtained from our method proposed, where discriminative emotional clusters could be observed.
Authors
(none)
Tags
Stats
Related papers
- GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning For Speech Emotion Recognition (2024)0.00
- Emonet: A Transfer Learning Framework For Multi-corpus Speech Emotion Recognition (2021)2.95
- Ctl-mtnet: A Novel Capsnet And Transfer Learning-based Mixed Task Net For The Single-corpus And Cross-corpus Speech Emotion Recognition (2022)10.21
- Towards Interpretable And Transferable Speech Emotion Recognition: Latent Representation Based Analysis Of Features, Methods And Corpora (2021)0.00
- Multi-task Semi-supervised Adversarial Autoencoding For Speech Emotion Recognition (2019)14.58
- Speecheq: Speech Emotion Recognition Based On Multi-scale Unified Datasets And Multitask Learning (2022)5.84
- Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search (2022)2.26
- Joint Learning Using Mixture-of-expert-based Representation For Speech Enhancement And Robust Emotion Recognition (2026)0.00