Style-label-free: Cross-speaker Style Transfer By Quantized VAE And Speaker-wise Normalization In Speech Synthesis
2022 Β· Chunyu Qiang, Peng Yang, Hao Che, et al.
Abstract
Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesised speech of a target speaker's timbre. Most previous approaches rely on data with style labels, but manually-annotated labels are expensive and not always reliable. In response to this problem, we propose Style-Label-Free, a cross-speaker style transfer method, which can realize the style transfer from source speaker to target speaker without style labels. Firstly, a reference encoder structure based on quantized variational autoencoder (Q-VAE) and style bottleneck is designed to extract discrete style representations. Secondly, a speaker-wise batch normalization layer is proposed to reduce the source speaker leakage. In order to improve the style extraction ability of the reference encoder, a style invariant and contrastive data augmentation method is proposed. Experimental results show that the method outperforms the baseline. We provide a website with audio samples.
Authors
(none)
Tags
Stats
Related papers
- Learning Latent Representations For Style Control And Transfer In End-to-end Speech Synthesis (2018)0.00
- Exploring Synthetic Data For Cross-speaker Style Transfer In Style Representation Based TTS (2024)0.00
- Stylespeech: Self-supervised Style Enhancing With Vq-vae-based Pre-training For Expressive Audiobook Speech Synthesis (2023)7.16
- One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation (2021)8.09
- Interpretable Style Transfer For Text-to-speech With Controlvae And Diffusion Bridge (2023)5.24
- Enriching Source Style Transfer In Recognition-synthesis Based Non-parallel Voice Conversion (2021)9.23
- Improving Data Augmentation-based Cross-speaker Style Transfer For TTS With Singing Voice, Style Filtering, And F0 Matching (2024)0.00
- Speech-to-speech Translation With Discrete-unit-based Style Transfer (2023)0.00