Feathertts: Robust And Efficient Attention Based Neural TTS
2020 Β· Qiao Tian, Zewang Zhang, Chao Liu, et al.
Abstract
Attention based neural TTS is elegant speech synthesis pipeline and has shown a powerful ability to generate natural speech. However, it is still not robust enough to meet the stability requirements for industrial products. Besides, it suffers from slow inference speed owning to the autoregressive generation process. In this work, we propose FeatherTTS, a robust and efficient attention-based neural TTS system. Firstly, we propose a novel Gaussian attention which utilizes interpretability of Gaussian attention and the strict monotonic property in TTS. By this method, we replace the commonly used stop token prediction architecture with attentive stop prediction. Secondly, we apply block sparsity on the autoregressive decoder to speed up speech synthesis. The experimental results show that our proposed FeatherTTS not only nearly eliminates the problem of word skipping, repeating in particularly hard texts and keep the naturalness of generated speech, but also speeds up acoustic feature ge
Authors
(none)
Tags
Stats
Related papers
- Neural Hmms Are All You Need (for High-quality Attention-free TTS) (2021)7.50
- Efficienttts: An Efficient And High-quality Text-to-speech Architecture (2020)0.00
- Robust Sequence-to-sequence Acoustic Modeling With Stepwise Monotonic Attention For Neural TTS (2019)11.49
- Fastspeech: Fast, Robust And Controllable Text To Speech (2019)0.00
- Graphspeech: Syntax-aware Graph Attention Network For Neural Speech Synthesis (2020)7.50
- High Quality, Lightweight And Adaptable TTS Using Lpcnet (2019)10.97
- Neural Speech Synthesis With Transformer Network (2018)19.95
- Attentron: Few-shot Text-to-speech Utilizing Attention-based Variable-length Embedding (2020)12.02