Efficienttts: An Efficient And High-quality Text-to-speech Architecture
2020 Β· Chenfeng Miao, Shuang Liang, Zhencheng Liu, et al.
Abstract
In this work, we address the Text-to-Speech (TTS) task by proposing a non-autoregressive architecture called EfficientTTS. Unlike the dominant non-autoregressive TTS models, which are trained with the need of external aligners, EfficientTTS optimizes all its parameters with a stable, end-to-end training procedure, while allowing for synthesizing high quality speech in a fast and efficient manner. EfficientTTS is motivated by a new monotonic alignment modeling approach (also introduced in this work), which specifies monotonic constraints to the sequence alignment with almost no increase of computation. By combining EfficientTTS with different feed-forward network structures, we develop a family of TTS models, including both text-to-melspectrogram and text-to-waveform networks. We experimentally show that the proposed models significantly outperform counterpart models such as Tacotron 2 and Glow-TTS in terms of speech quality, training efficiency and synthesis speed, while still producin
Authors
(none)
Tags
Stats
Related papers
- Fastspeech: Fast, Robust And Controllable Text To Speech (2019)0.00
- Glow-tts: A Generative Flow For Text-to-speech Via Monotonic Alignment Search (2020)0.00
- Aligntts: Efficient Feed-forward Text-to-speech System Without Explicit Alignment (2020)11.76
- Syncspeech: Efficient And Low-latency Text-to-speech Based On Temporal Masked Transformer (2025)0.00
- Reinforce-aligner: Reinforcement Alignment Search For Robust End-to-end Text-to-speech (2021)8.09
- Non-autoregressive TTS With Explicit Duration Modelling For Low-resource Highly Expressive Speech (2021)8.82
- Semi-supervised Training For Improving Data Efficiency In End-to-end Speech Synthesis (2018)13.28
- Feathertts: Robust And Efficient Attention Based Neural TTS (2020)5.84