Forward-backward Decoding For Regularizing End-to-end TTS
2019 Β· Yibin Zheng, Xi Wang, Lei He, et al.
Abstract
Neural end-to-end TTS can generate very high-quality synthesized speech, and even close to human recording within similar domain text. However, it performs unsatisfactory when scaling it to challenging test sets. One concern is that the encoder-decoder with attention-based network adopts autoregressive generative sequence model with the limitation of "exposure bias" To address this issue, we propose two novel methods, which learn to predict future by improving agreement between forward and backward decoding sequence. The first one is achieved by introducing divergence regularization terms into model training objective to reduce the mismatch between two directional models, namely L2R and R2L (which generates targets from left-to-right and right-to-left, respectively). While the second one operates on decoder-level and exploits the future information during decoding. In addition, we employ a joint training strategy to allow forward and backward decoding to improve each other in an intera
Authors
(none)
Tags
Stats
Related papers
- Twin Regularization For Online Speech Recognition (2018)6.34
- Initial Investigation Of An Encoder-decoder End-to-end TTS Framework Using Marginalization Of Monotonic Hard Latent Alignments (2019)0.00
- Robust Sequence-to-sequence Acoustic Modeling With Stepwise Monotonic Attention For Neural TTS (2019)11.49
- Regotron: Regularizing The Tacotron2 Architecture Via Monotonic Alignment Loss (2022)5.24
- Conditional Variational Autoencoder With Adversarial Learning For End-to-end Text-to-speech (2021)0.00
- R-BI: Regularized Batched Inputs Enhance Incremental Decoding Framework For Low-latency Simultaneous Speech Translation (2024)0.00
- Alternate Endings: Improving Prosody For Incremental Neural TTS With Predicted Future Text Input (2021)6.34
- Effective Decoder Masking For Transformer Based End-to-end Speech Recognition (2020)0.00