Learning Representations Of Emotional Speech With Deep Convolutional Generative Adversarial Networks
2017 Β· Jonathan Chang, Stefan Scherer
Abstract
Automatically assessing emotional valence in human speech has historically been a difficult task for machine learning algorithms. The subtle changes in the voice of the speaker that are indicative of positive or negative emotional states are often "overshadowed" by voice characteristics relating to emotional intensity or emotional activation. In this work we explore a representation learning approach that automatically derives discriminative representations of emotional speech. In particular, we investigate two machine learning strategies to improve classifier performance: (1) utilization of unlabeled data using a deep convolutional generative adversarial network (DCGAN), and (2) multitask learning. Within our extensive experiments we leverage a multitask annotated emotional corpus as well as a large unlabeled meeting corpus (around 100 hours). Our speaker-independent classification experiments show that in particular the use of unlabeled data in our investigations improves performance
Authors
(none)
Tags
Stats
Related papers
- Modeling Feature Representations For Affective Speech Using Generative Adversarial Networks (2019)0.00
- On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks (2018)12.33
- Improving Speech Emotion Recognition With Mutual Information Regularized Generative Model (2025)0.00
- Adversarial Auto-encoders For Speech Based Emotion Recognition (2018)12.68
- Non-parallel Emotion Conversion Using A Deep-generative Hybrid Network And An Adversarial Pair Discriminator (2020)6.77
- Hybrid Data Augmentation And Deep Attention-based Dilated Convolutional-recurrent Neural Networks For Speech Emotion Recognition (2021)12.81
- Multi-modal Emotion Detection With Transfer Learning (2020)0.00
- Learning Arousal-valence Representation From Categorical Emotion Labels Of Speech (2023)7.50