Multi-reference Tacotron By Intercross Training For Style Disentangling,transfer And Control In Speech Synthesis
2019 Β· Yanyao Bian, Changbin Chen, Yongguo Kang, et al.
Abstract
Speech style control and transfer techniques aim to enrich the diversity and expressiveness of synthesized speech. Existing approaches model all speech styles into one representation, lacking the ability to control a specific speech feature independently. To address this issue, we introduce a novel multi-reference structure to Tacotron and propose intercross training approach, which together ensure that each sub-encoder of the multi-reference encoder independently disentangles and controls a specific style. Experimental results show that our model is able to control and transfer desired speech styles individually.
Authors
(none)
Tags
Stats
Related papers
- Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios (2021)6.77
- Style Tokens: Unsupervised Style Modeling, Control And Transfer In End-to-end Speech Synthesis (2018)0.00
- Fine-grained Style Control In Transformer-based Text-to-speech Synthesis (2021)11.19
- Text-driven Emotional Style Control And Cross-speaker Style Transfer In Neural TTS (2022)7.81
- Towards End-to-end Prosody Transfer For Expressive Speech Synthesis With Tacotron (2018)0.00
- Expressive TTS Training With Frame And Style Reconstruction Loss (2020)12.74
- Multi-reference Neural TTS Stylization With Adversarial Cycle Consistency (2019)9.03
- Controllable Emotion Transfer For End-to-end Speech Synthesis (2020)13.05