Learning Latent Representations For Speech Generation And Transformation
2017 Β· Wei-Ning Hsu, Yu Zhang, James Glass
Abstract
An ability to model a generative process and learn a latent representation for speech in an unsupervised fashion will be crucial to process vast quantities of unlabelled speech data. Recently, deep probabilistic generative models such as Variational Autoencoders (VAEs) have achieved tremendous success in modeling natural images. In this paper, we apply a convolutional VAE to model the generative process of natural speech. We derive latent space arithmetic operations to disentangle learned latent representations. We demonstrate the capability of our model to modify the phonetic content or the speaker identity for speech segments using the derived operations, without the need for parallel supervisory data.
Authors
(none)
Tags
Stats
Related papers
- Learning And Controlling The Source-filter Representation Of Speech With A Variational Autoencoder (2022)7.50
- Deep Encoder-decoder Models For Unsupervised Learning Of Controllable Speech Synthesis (2018)0.00
- Variational Autoencoders For Learning Latent Representations Of Speech Emotion: A Preliminary Study (2017)13.11
- Learning Latent Representations For Style Control And Transfer In End-to-end Speech Synthesis (2018)0.00
- Data Augmentation For Spoken Language Understanding Via Joint Variational Generation (2018)10.61
- Text-to-speech Synthesis Based On Latent Variable Conversion Using Diffusion Probabilistic Model And Variational Autoencoder (2022)0.00
- A Statistically Principled And Computationally Efficient Approach To Speech Enhancement Using Variational Autoencoders (2019)9.23
- Unsupervised Speech Representation Learning Using Wavenet Autoencoders (2019)17.21