Teacher-student Training For Robust Tacotron-based TTS
2019 Β· Rui Liu, Berrak Sisman, Jingdong Li, et al.
Abstract
While neural end-to-end text-to-speech (TTS) is superior to conventional statistical methods in many ways, the exposure bias problem in the autoregressive models remains an issue to be resolved. The exposure bias problem arises from the mismatch between the training and inference process, that results in unpredictable performance for out-of-domain test data at run-time. To overcome this, we propose a teacher-student training scheme for Tacotron-based TTS by introducing a distillation loss function in addition to the feature loss function. We first train a Tacotron2-based TTS model by always providing natural speech frames to the decoder, that serves as a teacher model. We then train another Tacotron2-based model as a student model, of which the decoder takes the predicted speech frames as input, similar to how the decoder works during run-time inference. With the distillation loss, the student model learns the output probabilities from the teacher model, that is called knowledge distil
Authors
(none)
Tags
Stats
Related papers
- Semi-supervised Training For Improving Data Efficiency In End-to-end Speech Synthesis (2018)13.28
- Tacotron: Towards End-to-end Speech Synthesis (2017)0.00
- Parallel Tacotron: Non-autoregressive And Controllable TTS (2020)12.54
- Non-attentive Tacotron: Robust And Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling (2020)0.00
- Using Ipa-based Tacotron For Data Efficient Cross-lingual Speaker Adaptation And Pronunciation Enhancement (2020)0.00
- Regotron: Regularizing The Tacotron2 Architecture Via Monotonic Alignment Loss (2022)5.24
- Towards End-to-end Prosody Transfer For Expressive Speech Synthesis With Tacotron (2018)0.00
- Towards Transfer Learning For End-to-end Speech Synthesis From Deep Pre-trained Language Models (2019)0.00