Fast, Compact, And High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers For Mobile Devices
2016 · Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, et al.
Abstract
Acoustic models based on long short-term memory recurrent neural networks (LSTM-RNNs) were applied to statistical parametric speech synthesis (SPSS) and showed significant improvements in naturalness and latency over those based on hidden Markov models (HMMs). This paper describes further optimizations of LSTM-RNN-based SPSS for deployment on mobile devices; weight quantization, multi-frame inference, and robust inference using an \{\epsilon\}-contaminated Gaussian loss function. Experimental results in subjective listening tests show that these optimizations can make LSTM-RNN-based SPSS comparable to HMM-based SPSS in runtime speed while maintaining naturalness. Evaluations between LSTM-RNN- based SPSS and HMM-driven unit selection speech synthesis are also presented.
Authors
(none)
Tags
Stats
Related papers
- Investigating Gated Recurrent Neural Networks For Speech Synthesis (2016)0.00
- A Comparison Of Vietnamese Statistical Parametric Speech Synthesis Systems (2020)0.00
- UFANS: U-shaped Fully-parallel Acoustic Neural Structure For Statistical Parametric Speech Synthesis With 20X Faster (2018)0.00
- Deep Feed-forward Sequential Memory Networks For Speech Synthesis (2018)5.84
- Multi-task Wavenet: A Multi-task Generative Model For Statistical Parametric Speech Synthesis Without Fundamental Frequency Conditions (2018)8.09
- LSTM Deep Neural Networks Postfiltering For Improving The Quality Of Synthetic Voices (2016)0.00
- Rnn-based Speech Synthesis Using A Continuous Sinusoidal Model (2019)3.58
- High Quality Streaming Speech Synthesis With Low, Sentence-length-independent Latency (2021)8.60