Lp-wavenet: Linear Prediction-based Wavenet Speech Synthesis
2018 Β· Min-Jae Hwang, Frank Soong, Eunwoo Song, et al.
Abstract
We propose a linear prediction (LP)-based waveform generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TTS) systems. However, it is challenging to effectively train the neural vocoder when the target database contains massive amount of acoustical information such as prosody, style or expressiveness. As a solution, the approaches that only generate the vocal source component by a neural vocoder have been proposed. However, they tend to generate synthetic noise because the vocal source component is independently handled without considering the entire speech production process; where it is inevitable to come up with a mismatch between vocal source and vocal tract filter. To address this problem, we propose an LP-WaveNet vocoder, where the complicated interactions between vocal source and vocal tract components are jointly trained within a mixture density network-based WaveNet model. The exper
Authors
(none)
Tags
Stats
Related papers
- Improving Lpcnet-based Text-to-speech With Linear Prediction-structured Mixture Density Network (2020)5.24
- Wasserstein GAN And Waveform Loss-based Acoustic Model Training For Multi-speaker Text-to-speech Synthesis Systems Using A Wavenet Vocoder (2018)12.61
- Lpcnet: Improving Neural Speech Synthesis Through Linear Prediction (2018)0.00
- High-fidelity And Low-latency Universal Neural Vocoder Based On Multiband Wavernn With Data-driven Linear Prediction For Discrete Waveform Modeling (2021)6.77
- GELP: Gan-excited Linear Prediction For Speech Synthesis From Mel-spectrogram (2019)10.74
- End-to-end Lpcnet: A Neural Vocoder With Fully-differentiable LPC Estimation (2022)7.16
- Featherwave: An Efficient High-fidelity Neural Vocoder With Multi-band Linear Prediction (2020)8.35
- A Comparison Of Recent Waveform Generation And Acoustic Modeling Methods For Neural-network-based Speech Synthesis (2018)11.76