Scaling And Bias Codes For Modeling Speaker-adaptive Dnn-based Speech Synthesis Systems
2018 Β· Hieu-Thi Luong, Junichi Yamagishi
Abstract
Most neural-network based speaker-adaptive acoustic models for speech synthesis can be categorized into either layer-based or input-code approaches. Although both approaches have their own pros and cons, most existing works on speaker adaptation focus on improving one or the other. In this paper, after we first systematically overview the common principles of neural-network based speaker-adaptive models, we show that these approaches can be represented in a unified framework and can be generalized further. More specifically, we introduce the use of scaling and bias codes as generalized means for speaker-adaptive transformation. By utilizing these codes, we can create a more efficient factorized speaker-adaptive model and capture advantages of both approaches while reducing their disadvantages. The experiments show that the proposed method can improve the performance of speaker adaptation compared with speaker adaptation based on the conventional input code.
Authors
(none)
Tags
Stats
Related papers
- Linear Networks Based Speaker Adaptation For Speech Synthesis (2018)6.34
- A Unified Speaker Adaptation Method For Speech Synthesis Using Transcribed And Untranscribed Speech With Backpropagation (2019)0.00
- Bayesian Learning For Deep Neural Network Adaptation (2020)9.76
- Multimodal Speech Synthesis Architecture For Unsupervised Speaker Adaptation (2018)6.34
- Empirical Evaluation Of Speaker Adaptation On DNN Based Acoustic Model (2018)5.24
- Efficient Black-box Speaker Verification Model Adaptation With Reprogramming And Backend Learning (2023)0.00
- Speaker-adaptive Neural Vocoders For Parametric Speech Synthesis Systems (2018)2.26
- Adversarial Speaker Adaptation (2019)10.21