Linear Networks Based Speaker Adaptation For Speech Synthesis
2018 Β· Zhiying Huang, Heng Lu, Ming Lei, et al.
Abstract
Speaker adaptation methods aim to create fair quality synthesis speech voice font for target speakers while only limited resources available. Recently, as deep neural networks based statistical parametric speech synthesis (SPSS) methods become dominant in SPSS TTS back-end modeling, speaker adaptation under the neural network based SPSS framework has also became an important task. In this paper, linear networks (LN) is inserted in multiple neural network layers and fine-tuned together with output layer for best speaker adaptation performance. When adaptation data is extremely small, the low-rank plus diagonal(LRPD) decomposition for LN is employed to make the adapted voice more stable. Speaker adaptation experiments are conducted under a range of adaptation utterances numbers. Moreover, speaker adaptation from 1) female to female, 2) male to female and 3) female to male are investigated. Objective measurement and subjective tests show that LN with LRPD decomposition performs most stabl
Authors
(none)
Tags
Stats
Related papers
- Empirical Evaluation Of Speaker Adaptation On DNN Based Acoustic Model (2018)5.24
- A Unified Speaker Adaptation Method For Speech Synthesis Using Transcribed And Untranscribed Speech With Backpropagation (2019)0.00
- High Quality, Lightweight And Adaptable TTS Using Lpcnet (2019)10.97
- Scaling And Bias Codes For Modeling Speaker-adaptive Dnn-based Speech Synthesis Systems (2018)6.34
- Multimodal Speech Synthesis Architecture For Unsupervised Speaker Adaptation (2018)6.34
- Bayesian Learning For Deep Neural Network Adaptation (2020)9.76
- Hypertts: Parameter Efficient Adaptation In Text To Speech Using Hypernetworks (2024)3.23
- Listen, Attend, Spell And Adapt: Speaker Adapted Sequence-to-sequence ASR (2019)8.82