Enhancing Expressive Voice Conversion With Discrete Pitch-conditioned Flow Matching Model
2025 Β· Jialong Zuo, Shengpeng Ji, Minghui Fang, et al.
Abstract
This paper introduces PFlow-VC, a conditional flow matching voice conversion model that leverages fine-grained discrete pitch tokens and target speaker prompt information for expressive voice conversion (VC). Previous VC works primarily focus on speaker conversion, with further exploration needed in enhancing expressiveness (such as prosody and emotion) for timbre conversion. Unlike previous methods, we adopt a simple and efficient approach to enhance the style expressiveness of voice conversion models. Specifically, we pretrain a self-supervised pitch VQVAE model to discretize speaker-irrelevant pitch information and leverage a masked pitch-conditioned flow matching model for Mel-spectrogram synthesis, which provides in-context pitch modeling capabilities for the speaker conversion model, effectively improving the voice style transfer capacity. Additionally, we improve timbre similarity by combining global timbre embeddings with time-varying timbre tokens. Experiments on unseen LibriT
Authors
(none)
Tags
Stats
Related papers
- Cycleflow: Leveraging Cycle Consistency In Flow Matching For Speaker Style Adaptation (2025)4.52
- PMVC: Data Augmentation-based Prosody Modeling For Expressive Voice Conversion (2023)9.23
- Glowvc: Mel-spectrogram Space Disentangling Model For Language-independent Text-free Voice Conversion (2022)6.34
- Zero-shot Voice Conversion Via Content-aware Timbre Ensemble And Conditional Flow Matching (2024)0.00
- Expressive-vc: Highly Expressive Voice Conversion With Attention Fusion Of Bottleneck And Perturbation Features (2022)9.03
- Converting Anyone's Voice: End-to-end Expressive Voice Conversion With A Conditional Diffusion Model (2024)5.24
- Real-time And Accurate: Zero-shot High-fidelity Singing Voice Conversion With Multi-condition Flow Synthesis (2024)0.00
- Voiceprompter: Robust Zero-shot Voice Conversion With Voice Prompt And Conditional Flow Matching (2025)3.58