Voicy: Zero-shot Non-parallel Voice Conversion In Noisy Reverberant Environments
2021 Β· Alejandro Mottini, Jaime Lorenzo-Trueba, Sri Vishnu Kumar Karlapati, et al.
Abstract
Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker. While there is a rich literature on VC, most proposed methods are trained and evaluated on clean speech recordings. However, many acoustic environments are noisy and reverberant, severely restricting the applicability of popular VC methods to such scenarios. To address this limitation, we propose Voicy, a new VC framework particularly tailored for noisy speech. Our method, which is inspired by the de-noising auto-encoders framework, is comprised of four encoders (speaker, content, phonetic and acoustic-ASR) and one decoder. Importantly, Voicy is capable of performing non-parallel zero-shot VC, an important requirement for any VC system that needs to work on speakers not seen during training. We have validated our approach using a noisy reverberant version of the LibriSpeech dataset. Experimental results show that Voicy outperf
Authors
(none)
Tags
Stats
Related papers
- Zero-shot Voice Conversion Via Self-supervised Prosody Representation Learning (2021)6.34
- An Evaluation Of Three-stage Voice Conversion Framework For Noisy And Reverberant Conditions (2022)5.24
- Vec-tok-vc+: Residual-enhanced Robust Zero-shot Voice Conversion With Progressive Constraints In A Dual-mode Training Strategy (2024)3.58
- SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System For Both Human Beings And Machines (2021)8.09
- Improvement Speaker Similarity For Zero-shot Any-to-any Voice Conversion Of Whispered And Regular Speech (2024)4.52
- Convoice: Real-time Zero-shot Voice Style Transfer With Convolutional Network (2020)0.00
- Fastvc: Fast Voice Conversion With Non-parallel Data (2020)5.24
- Robust Disentangled Variational Speech Representation Learning For Zero-shot Voice Conversion (2022)10.97