Expressive-vc: Highly Expressive Voice Conversion With Attention Fusion Of Bottleneck And Perturbation Features
2022 Β· Ziqian Ning, Qicong Xie, Pengcheng Zhu, et al.
Abstract
Voice conversion for highly expressive speech is challenging. Current approaches struggle with the balancing between speaker similarity, intelligibility and expressiveness. To address this problem, we propose Expressive-VC, a novel end-to-end voice conversion framework that leverages advantages from both neural bottleneck feature (BNF) approach and information perturbation approach. Specifically, we use a BNF encoder and a Perturbed-Wav encoder to form a content extractor to learn linguistic and para-linguistic features respectively, where BNFs come from a robust pre-trained ASR model and the perturbed wave becomes speaker-irrelevant after signal perturbation. We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input. Finally the decoder consumes the inte
Authors
(none)
Tags
Stats
Related papers
- Disentangleing Content And Fine-grained Prosody Information Via Hybrid ASR Bottleneck Features For Voice Conversion (2022)10.48
- PMVC: Data Augmentation-based Prosody Modeling For Expressive Voice Conversion (2023)9.23
- Converting Anyone's Voice: End-to-end Expressive Voice Conversion With A Conditional Diffusion Model (2024)5.24
- Expressive Voice Conversion: A Joint Framework For Speaker Identity And Emotional Style Transfer (2021)9.03
- Assem-vc: Realistic Voice Conversion By Assembling Modern Speech Synthesis Techniques (2021)11.64
- Enhancing Expressive Voice Conversion With Discrete Pitch-conditioned Flow Matching Model (2025)5.84
- Beyond Voice Identity Conversion: Manipulating Voice Attributes By Adversarial Learning Of Structured Disentangled Representations (2021)0.00
- Conditional Deep Hierarchical Variational Autoencoder For Voice Conversion (2021)0.00