VAENAR-TTS: Variational Auto-encoder Based Non-autoregressive Text-to-speech Synthesis
2021 Β· Hui Lu, Zhiyong Wu, Xixin Wu, et al.
Abstract
This paper describes a variational auto-encoder based non-autoregressive text-to-speech (VAENAR-TTS) model. The autoregressive TTS (AR-TTS) models based on the sequence-to-sequence architecture can generate high-quality speech, but their sequential decoding process can be time-consuming. Recently, non-autoregressive TTS (NAR-TTS) models have been shown to be more efficient with the parallel decoding process. However, these NAR-TTS models rely on phoneme-level durations to generate a hard alignment between the text and the spectrogram. Obtaining duration labels, either through forced alignment or knowledge distillation, is cumbersome. Furthermore, hard alignment based on phoneme expansion can degrade the naturalness of the synthesized speech. In contrast, the proposed model of VAENAR-TTS is an end-to-end approach that does not require phoneme-level durations. The VAENAR-TTS model does not contain recurrent structures and is completely non-autoregressive in both the training and inferenc
Authors
(none)
Tags
Stats
Related papers
- VARA-TTS: Non-autoregressive Text-to-speech Synthesis Based On Very Deep VAE With Residual Attention (2021)0.00
- Generating Diverse And Natural Text-to-speech Samples Using A Quantized Fine-grained VAE And Auto-regressive Prosody Prior (2020)12.54
- Parallel Tacotron: Non-autoregressive And Controllable TTS (2020)12.54
- Conditional Variational Autoencoder With Adversarial Learning For End-to-end Text-to-speech (2021)0.00
- Continuous Autoregressive Modeling With Stochastic Monotonic Alignment For Speech Synthesis (2025)0.00
- Text-to-speech Synthesis Based On Latent Variable Conversion Using Diffusion Probabilistic Model And Variational Autoencoder (2022)0.00
- Autotts: End-to-end Text-to-speech Synthesis Through Differentiable Duration Modeling (2022)0.00
- Robust And Unbounded Length Generalization In Autoregressive Transformer-based Text-to-speech (2024)0.00