Excitnet Vocoder: A Neural Excitation Model For Parametric Speech Synthesis Systems
2018 Β· Eunwoo Song, Kyungguen Byun, Hong-Goo Kang
Abstract
This paper proposes a WaveNet-based neural excitation model (ExcitNet) for statistical parametric speech synthesis systems. Conventional WaveNet-based neural vocoding systems significantly improve the perceptual quality of synthesized speech by statistically generating a time sequence of speech waveforms through an auto-regressive framework. However, they often suffer from noisy outputs because of the difficulties in capturing the complicated time-varying nature of speech signals. To improve modeling efficiency, the proposed ExcitNet vocoder employs an adaptive inverse filter to decouple spectral components from the speech signal. The residual component, i.e. excitation signal, is then trained and generated within the WaveNet framework. In this way, the quality of the synthesized speech signal can be further improved since the spectral component is well represented by a deep learning framework and, moreover, the residual component is efficiently generated by the WaveNet framework. Expe
Authors
(none)
Tags
Stats
Related papers
- Effective Parameter Estimation Methods For An Excitnet Model In Generative Text-to-speech Systems (2019)0.00
- Estvocoder: An Excitation-spectral-transformed Neural Vocoder Conditioned On Mel Spectrogram (2024)0.00
- Continuous Wavelet Vocoder-based Decomposition Of Parametric Speech Waveform Synthesis (2021)0.00
- Speaker-independent Raw Waveform Model For Glottal Excitation (2018)9.76
- Lp-wavenet: Linear Prediction-based Wavenet Speech Synthesis (2018)0.00
- A Comparison Of Recent Waveform Generation And Acoustic Modeling Methods For Neural-network-based Speech Synthesis (2018)11.76
- Neural Source-filter-based Waveform Model For Statistical Parametric Speech Synthesis (2018)13.97
- Neuraldps: Neural Deterministic Plus Stochastic Model With Multiband Excitation For Noise-controllable Waveform Generation (2022)0.00