Multi-task Semi-supervised Adversarial Autoencoding For Speech Emotion Recognition
2019 Β· Siddique Latif, Rajib Rana, Sara Khalifa, et al.
Abstract
Inspite the emerging importance of Speech Emotion Recognition (SER), the state-of-the-art accuracy is quite low and needs improvement to make commercial applications of SER viable. A key underlying reason for the low accuracy is the scarcity of emotion datasets, which is a challenge for developing any robust machine learning model in general. In this paper, we propose a solution to this problem: a multi-task learning framework that uses auxiliary tasks for which data is abundantly available. We show that utilisation of this additional data can improve the primary task of SER for which only limited labelled data is available. In particular, we use gender identifications and speaker recognition as auxiliary tasks, which allow the use of very large datasets, e.g., speaker classification datasets. To maximise the benefit of multi-task learning, we further use an adversarial autoencoder (AAE) within our framework, which has a strong capability to learn powerful and discriminative features.
Authors
(none)
Tags
Stats
Related papers
- Attention-augmented End-to-end Multi-task Learning For Emotion Prediction From Speech (2019)13.50
- Adversarial Auto-encoders For Speech Based Emotion Recognition (2018)12.68
- Self Supervised Adversarial Domain Adaptation For Cross-corpus And Cross-language Speech Emotion Recognition (2022)13.11
- Continuous Metric Learning For Transferable Speech Emotion Recognition And Embedding Across Low-resource Languages (2022)0.00
- Multi-channel Auto-encoder For Speech Emotion Recognition (2018)0.00
- MF-AED-AEC: Speech Emotion Recognition By Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction (2024)0.00
- Active Learning Based Fine-tuning Framework For Speech Emotion Recognition (2023)6.34
- Towards Speech Emotion Recognition "in The Wild" Using Aggregated Corpora And Deep Multi-task Learning (2017)12.87