Using A Pitch-synchronous Residual Codebook For Hybrid Hmm/frame Selection Speech Synthesis
2019 · Thomas Drugman, Alexis Moinet, Thierry Dutoit, et al.
Abstract
This paper proposes a method to improve the quality delivered by statistical parametric speech synthesizers. For this, we use a codebook of pitch-synchronous residual frames, so as to construct a more realistic source signal. First a limited codebook of typical excitations is built from some training database. During the synthesis part, HMMs are used to generate filter and source coefficients. The latter coefficients contain both the pitch and a compact representation of target residual frames. The source signal is obtained by concatenating excitation frames picked up from the codebook, based on a selection criterion and taking target residual coefficients as input. Subjective results show a relevant improvement compared to the basic technique.
Authors
(none)
Tags
Stats
Related papers
- Full-sum Decoding For Hybrid HMM Based Speech Recognition Using LSTM Language Model (2020)0.00
- Using Heterogeneity In Semi-supervised Transcription Hypotheses To Improve Code-switched Speech Recognition (2021)0.00
- Fast And Small Footprint Hybrid Hmm-hifigan Based System For Speech Synthesis In Indian Languages (2023)0.00
- Spectral Codecs: Improving Non-autoregressive Speech Synthesis With Spectrogram-based Audio Codecs (2024)0.00
- Msr-codec: A Low-bitrate Multi-stream Residual Codec For High-fidelity Speech Generation With Information Disentanglement (2025)2.35
- LSTM Deep Neural Networks Postfiltering For Improving The Quality Of Synthetic Voices (2016)0.00
- Combining Frame-synchronous And Label-synchronous Systems For Speech Recognition (2021)0.00
- Speech Decomposition Based On A Hybrid Speech Model And Optimal Segmentation (2021)0.00