Glowvc: Mel-spectrogram Space Disentangling Model For Language-independent Text-free Voice Conversion
2022 · Magdalena Proszewska, Grzegorz Beringer, Daniel Sáez-Trigueros, et al.
Abstract
In this paper, we propose GlowVC: a multilingual multi-speaker flow-based model for language-independent text-free voice conversion. We build on Glow-TTS, which provides an architecture that enables use of linguistic features during training without the necessity of using them for VC inference. We consider two versions of our model: GlowVC-conditional and GlowVC-explicit. GlowVC-conditional models the distribution of mel-spectrograms with speaker-conditioned flow and disentangles the mel-spectrogram space into content- and pitch-relevant dimensions, while GlowVC-explicit models the explicit distribution with unconditioned flow and disentangles said space into content-, pitch- and speaker-relevant dimensions. We evaluate our models in terms of intelligibility, speaker similarity and naturalness for intra- and cross-lingual conversion in seen and unseen languages. GlowVC models greatly outperform AutoVC baseline in terms of intelligibility, while achieving just as high speaker similarity
Authors
(none)
Tags
Stats
Related papers
- Enhancing Expressive Voice Conversion With Discrete Pitch-conditioned Flow Matching Model (2025)5.84
- Cross-lingual Text-to-speech With Flow-based Voice Conversion For Improved Pronunciation (2022)0.00
- Cross-lingual Knowledge Distillation Via Flow-based Voice Conversion For Robust Polyglot Text-to-speech (2023)0.00
- Fastvc: Fast Voice Conversion With Non-parallel Data (2020)5.24
- Text-free Non-parallel Many-to-many Voice Conversion Using Normalising Flows (2022)7.16
- Zero-shot Voice Conversion Via Content-aware Timbre Ensemble And Conditional Flow Matching (2024)0.00
- Glow-wavegan: Learning Speech Representations From Gan-based Variational Auto-encoder For High Fidelity Flow-based Speech Synthesis (2021)8.35
- Ultrasound-based Articulatory-to-acoustic Mapping With Waveglow Speech Synthesis (2020)8.82