Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
2020 Β· Tomoki Koriyama, Hiroshi Saruwatari
Abstract
This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling. DGP is a Bayesian deep model that can be trained effectively with the consideration of model complexity and is a kernel regression model that can have high expressibility. In the previous studies, it was shown that the DGP-based speech synthesis outperformed neural network-based one, in which both models used a feed-forward architecture. To improve the naturalness of synthetic speech, in this paper, we show that DGP can be applied to utterance-level modeling using recurrent architecture models. We adopt a simple recurrent unit (SRU) for the proposed model to achieve a recurrent architecture, in which we can execute fast speech parameter generation by using the high parallelization nature of SRU. The objective and subjective evaluation results show that the proposed SRU-DGP-based speech synthesis outperforms not only feed-forward DGP but also automatically tuned SRU- and l
Authors
(none)
Tags
Stats
Related papers
- Light Gated Recurrent Units For Speech Recognition (2018)18.90
- Investigating Gated Recurrent Neural Networks For Speech Synthesis (2016)0.00
- Dynamic Gated Recurrent Neural Network For Compute-efficient Speech Enhancement (2024)8.35
- Generative Pre-trained Speech Language Model With Efficient Hierarchical Transformer (2024)5.96
- Hierarchical Multi-grained Generative Model For Expressive Speech Synthesis (2020)8.60
- A Deep Representation Learning-based Speech Enhancement Method Using Complex Convolution Recurrent Variational Autoencoder (2023)7.16
- DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network For Speech Enhancement (2020)0.00
- Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks (2017)16.21