Parallel Wavenet Conditioned On VAE Latent Vectors
2020 Β· Jonas Rohnke, Tom Merritt, Jaime Lorenzo-Trueba, et al.
Abstract
Recently the state-of-the-art text-to-speech synthesis systems have shifted to a two-model approach: a sequence-to-sequence model to predict a representation of speech (typically mel-spectrograms), followed by a 'neural vocoder' model which produces the time-domain speech waveform from this intermediate speech representation. This approach is capable of synthesizing speech that is confusable with natural speech recordings. However, the inference speed of neural vocoder approaches represents a major obstacle for deploying this technology for commercial applications. Parallel WaveNet is one approach which has been developed to address this issue, trading off some synthesis quality for significantly faster inference speed. In this paper we investigate the use of a sentence-level conditioning vector to improve the signal quality of a Parallel WaveNet neural vocoder. We condition the neural vocoder with the latent vector from a pre-trained VAE component of a Tacotron 2-style sequence-to-seq
Authors
(none)
Tags
Stats
Related papers
- Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions (2017)24.07
- Non-autoregressive Neural Text-to-speech (2019)0.00
- Parallel Wavenet: Fast High-fidelity Speech Synthesis (2017)0.00
- Parallel Waveform Synthesis Based On Generative Adversarial Networks With Voicing-aware Conditional Discriminators (2020)7.16
- Parallel Tacotron: Non-autoregressive And Controllable TTS (2020)12.54
- Lp-wavenet: Linear Prediction-based Wavenet Speech Synthesis (2018)0.00
- Lvcnet: Efficient Condition-dependent Modeling Network For Waveform Generation (2021)8.09
- Wasserstein GAN And Waveform Loss-based Acoustic Model Training For Multi-speaker Text-to-speech Synthesis Systems Using A Wavenet Vocoder (2018)12.61