A Comparison Of Recent Waveform Generation And Acoustic Modeling Methods For Neural-network-based Speech Synthesis
2018 Β· Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, et al.
Abstract
Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that a Wavenet vocoder outperformed classical source-filter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.
Authors
(none)
Tags
Stats
Related papers
- Wasserstein GAN And Waveform Loss-based Acoustic Model Training For Multi-speaker Text-to-speech Synthesis Systems Using A Wavenet Vocoder (2018)12.61
- Generative Adversarial Network-based Glottal Waveform Model For Statistical Parametric Speech Synthesis (2019)10.35
- Analysis By Adversarial Synthesis -- A Novel Approach For Speech Vocoding (2019)3.58
- Towards Robust Neural Vocoding For Speech Generation: A Survey (2019)0.00
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- Lp-wavenet: Linear Prediction-based Wavenet Speech Synthesis (2018)0.00
- Parametric Resynthesis With Neural Vocoders (2019)8.60
- Pretraining Strategies, Waveform Model Choice, And Acoustic Configurations For Multi-speaker End-to-end Speech Synthesis (2020)0.00