Towards Adversarial Learning Of Speaker-invariant Representation For Speech Emotion Recognition
2019 Β· Ming Tu, Yun Tang, Jing Huang, et al.
Abstract
Speech emotion recognition (SER) has attracted great attention in recent years due to the high demand for emotionally intelligent speech interfaces. Deriving speaker-invariant representations for speech emotion recognition is crucial. In this paper, we propose to apply adversarial training to SER to learn speaker-invariant representations. Our model consists of three parts: a representation learning sub-network with time-delay neural network (TDNN) and LSTM with statistical pooling, an emotion classification network and a speaker classification network. Both the emotion and speaker classification network take the output of the representation learning network as input. Two training strategies are employed: one based on domain adversarial training (DAT) and the other one based on cross-gradient training (CGT). Besides the conventional data set, we also evaluate our proposed models on a much larger publicly available emotion data set with 250 speakers. Evaluation results show that on IEMO
Authors
(none)
Tags
Stats
Related papers
- Domain Adversarial Learning For Emotion Recognition (2019)0.00
- Self Supervised Adversarial Domain Adaptation For Cross-corpus And Cross-language Speech Emotion Recognition (2022)13.11
- Unsupervised Adversarial Domain Adaptation For Cross-lingual Speech Emotion Recognition (2019)12.74
- Supervised Contrastive Learning With Nearest Neighbor Search For Speech Emotion Recognition (2023)7.16
- Adversarial Machine Learning And Speech Emotion Recognition: Utilizing Generative Adversarial Networks For Robustness (2018)0.00
- Multi-task Semi-supervised Adversarial Autoencoding For Speech Emotion Recognition (2019)14.58
- Unsupervised Representations Improve Supervised Learning In Speech Emotion Recognition (2023)0.00
- Improved Speech Emotion Recognition Using Transfer Learning And Spectrogram Augmentation (2021)12.74