Improving Lpcnet-based Text-to-speech With Linear Prediction-structured Mixture Density Network
2020 Β· Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, et al.
Abstract
In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN). The recently proposed LPCNet vocoder has successfully achieved high-quality and lightweight speech synthesis systems by combining a vocal tract LP filter with a WaveRNN-based vocal source (i.e., excitation) generator. However, the quality of synthesized speech is often unstable because the vocal source component is insufficiently represented by the mu-law quantization method, and the model is trained without considering the entire speech production mechanism. To address this problem, we first introduce LP-MDN, which enables the autoregressive neural vocoder to structurally represent the interactions between the vocal tract and vocal source components. Then, we propose to incorporate the LP-MDN to the LPCNet vocoder by replacing the conventional discretized output with continuous density distribution. The experimental results verify that the proposed system provi
Authors
(none)
Tags
Stats
Related papers
- End-to-end Lpcnet: A Neural Vocoder With Fully-differentiable LPC Estimation (2022)7.16
- Lp-wavenet: Linear Prediction-based Wavenet Speech Synthesis (2018)0.00
- Lpcnet: Improving Neural Speech Synthesis Through Linear Prediction (2018)0.00
- A Real-time Wideband Neural Vocoder At 1.6 Kb/s Using Lpcnet (2019)12.61
- High Quality, Lightweight And Adaptable TTS Using Lpcnet (2019)10.97
- High-fidelity And Low-latency Universal Neural Vocoder Based On Multiband Wavernn With Data-driven Linear Prediction For Discrete Waveform Modeling (2021)6.77
- Neural Speech Synthesis On A Shoestring: Improving The Efficiency Of Lpcnet (2022)5.84
- Controllable Sequence-to-sequence Neural TTS With LPCNET Backend For Real-time Speech Synthesis On CPU (2020)0.00