Voice Conversion With Conditional Samplernn
2018 Β· Cong Zhou, Michael Horgan, Vivek Kumar, et al.
Abstract
Here we present a novel approach to conditioning the SampleRNN generative model for voice conversion (VC). Conventional methods for VC modify the perceived speaker identity by converting between source and target acoustic features. Our approach focuses on preserving voice content and depends on the generative network to learn voice style. We first train a multi-speaker SampleRNN model conditioned on linguistic features, pitch contour, and speaker identity using a multi-speaker speech corpus. Voice-converted speech is generated using linguistic features and pitch contour extracted from the source speaker, and the target speaker identity. We demonstrate that our system is capable of many-to-many voice conversion without requiring parallel data, enabling broad applications. Subjective evaluation demonstrates that our approach outperforms conventional VC methods.
Authors
(none)
Tags
Stats
Related papers
- Many-to-many Voice Conversion Using Conditional Cycle-consistent Adversarial Networks (2020)10.85
- Voice Conversion With Diverse Intonation Using Conditional Variational Auto-encoder (2025)0.00
- Expressive Voice Conversion: A Joint Framework For Speaker Identity And Emotional Style Transfer (2021)9.03
- Assem-vc: Realistic Voice Conversion By Assembling Modern Speech Synthesis Techniques (2021)11.64
- Stargan-vc2: Rethinking Conditional Methods For Stargan-based Voice Conversion (2019)0.00
- Converting Anyone's Voice: End-to-end Expressive Voice Conversion With A Conditional Diffusion Model (2024)5.24
- Starganv2-vc: A Diverse, Unsupervised, Non-parallel Framework For Natural-sounding Voice Conversion (2021)13.70
- One-shot Voice Conversion By Separating Speaker And Content Representations With Instance Normalization (2019)0.00