Fragmentvc: Any-to-any Voice Conversion By End-to-end Extracting And Fusing Fine-grained Voice Fragments With Attention
2020 Β· Yist Y. Lin, Chung-Ming Chien, Jheng-Hao Lin, et al.
Abstract
Any-to-any voice conversion aims to convert the voice from and to any speakers even unseen during training, which is much more challenging compared to one-to-one or many-to-many tasks, but much more attractive in real-world scenarios. In this paper we proposed FragmentVC, in which the latent phonetic structure of the utterance from the source speaker is obtained from Wav2Vec 2.0, while the spectral features of the utterance(s) from the target speaker are obtained from log mel-spectrograms. By aligning the hidden structures of the two different feature spaces with a two-stage training process, FragmentVC is able to extract fine-grained voice fragments from the target speaker utterance(s) and fuse them into the desired utterance, all based on the attention mechanism of Transformer as verified with analysis on attention maps, and is accomplished end-to-end. This approach is trained with reconstruction loss only without any disentanglement considerations between content and speaker informa
Authors
(none)
Tags
Stats
Related papers
- S2VC: A Framework For Any-to-any Voice Conversion With Self-supervised Pretrained Representations (2021)12.25
- Expressive-vc: Highly Expressive Voice Conversion With Attention Fusion Of Bottleneck And Perturbation Features (2022)9.03
- Fastvc: Fast Voice Conversion With Non-parallel Data (2020)5.24
- DRVC: A Framework Of Any-to-any Voice Conversion With Self-supervised Learning (2022)9.59
- Mediumvc: Any-to-any Voice Conversion Using Synthetic Specific-speaker Speeches As Intermedium Features (2021)0.00
- Assem-vc: Realistic Voice Conversion By Assembling Modern Speech Synthesis Techniques (2021)11.64
- Many-to-many Voice Conversion Based Feature Disentanglement Using Variational Autoencoder (2021)7.81
- Pureformer-vc: Non-parallel One-shot Voice Conversion With Pure Transformer Blocks And Triplet Discriminative Training (2024)0.00