Variational Autoencoders For Learning Latent Representations Of Speech Emotion: A Preliminary Study
2017 Β· Siddique Latif, Rajib Rana, Junaid Qadir, et al.
Abstract
Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective features is crucial. Currently, handcrafted features are mostly used for speech emotion recognition, however, features learned automatically using deep learning have shown strong success in many problems, especially in image processing. In particular, deep generative models such as Variational Autoencoders (VAEs) have gained enormous success for generating features for natural images. Inspired by this, we propose VAEs for deriving the latent representation of speech signals and use this representation to classify emotions. To the best of our knowledge, we are the first to propose VAEs for speech emotion classification. Evaluations on the IEMOCAP dataset demonstrate that features learned by VAEs can produce state-of-the-art results for speech emotion class
Authors
(none)
Tags
Stats
Related papers
- Learning And Controlling The Source-filter Representation Of Speech With A Variational Autoencoder (2022)7.50
- Learning Latent Representations For Speech Generation And Transformation (2017)13.50
- A Statistically Principled And Computationally Efficient Approach To Speech Enhancement Using Variational Autoencoders (2019)9.23
- Audio-visual Speech Enhancement Using Conditional Variational Auto-encoders (2019)13.65
- A Benchmark Of Dynamical Variational Autoencoders Applied To Speech Spectrogram Modeling (2021)6.77
- Unsupervised Speech Enhancement Using Dynamical Variational Auto-encoders (2021)13.28
- Adversarial Auto-encoders For Speech Based Emotion Recognition (2018)12.68
- Deep Encoder-decoder Models For Unsupervised Learning Of Controllable Speech Synthesis (2018)0.00