Text-free Non-parallel Many-to-many Voice Conversion Using Normalising Flows
2022 · Thomas Merritt, Abdelhamid Ezzerg, Piotr Biliński, et al.
Abstract
Non-parallel voice conversion (VC) is typically achieved using lossy representations of the source speech. However, ensuring only speaker identity information is dropped whilst all other information from the source speech is retained is a large challenge. This is particularly challenging in the scenario where at inference-time we have no knowledge of the text being read, i.e., text-free VC. To mitigate this, we investigate information-preserving VC approaches. Normalising flows have gained attention for text-to-speech synthesis, however have been under-explored for VC. Flows utilize invertible functions to learn the likelihood of the data, thus provide a lossless encoding of speech. We investigate normalising flows for VC in both text-conditioned and text-free scenarios. Furthermore, for text-free VC we compare pre-trained and jointly-learnt priors. Flow-based VC evaluations show no degradation between text-free and text-conditioned VC, resulting in improvements over the state-of-the
Authors
(none)
Tags
Stats
Related papers
- Fastvc: Fast Voice Conversion With Non-parallel Data (2020)5.24
- Enhancing Expressive Voice Conversion With Discrete Pitch-conditioned Flow Matching Model (2025)5.84
- Cross-lingual Text-to-speech With Flow-based Voice Conversion For Improved Pronunciation (2022)0.00
- Voicy: Zero-shot Non-parallel Voice Conversion In Noisy Reverberant Environments (2021)5.24
- Cycleflow: Leveraging Cycle Consistency In Flow Matching For Speaker Style Adaptation (2025)4.52
- Glowvc: Mel-spectrogram Space Disentangling Model For Language-independent Text-free Voice Conversion (2022)6.34
- Cross-lingual Knowledge Distillation Via Flow-based Voice Conversion For Robust Polyglot Text-to-speech (2023)0.00
- Flowvocoder: A Small Footprint Neural Vocoder Based Normalizing Flow For Speech Synthesis (2021)0.00