Toward Degradation-robust Voice Conversion
2021 Β· Chien-Yu Huang, Kai-Wei Chang, Hung-Yi Lee
Abstract
Any-to-any voice conversion technologies convert the vocal timbre of an utterance to any speaker even unseen during training. Although there have been several state-of-the-art any-to-any voice conversion models, they were all based on clean utterances to convert successfully. However, in real-world scenarios, it is difficult to collect clean utterances of a speaker, and they are usually degraded by noises or reverberations. It thus becomes highly desired to understand how these degradations affect voice conversion and build a degradation-robust model. We report in this paper the first comprehensive study on the degradation robustness of any-to-any voice conversion. We show that the performance of state-of-the-art models nowadays was severely hampered given degraded utterances. To this end, we then propose speech enhancement concatenation and denoising training to improve the robustness. In addition to common degradations, we also consider adversarial noises, which alter the model outpu
Authors
(none)
Tags
Stats
Related papers
- How Far Are We From Robust Voice Conversion: A Survey (2020)9.41
- Noise-robust Voice Conversion By Conditional Denoising Training Using Latent Variables Of Recording Quality And Environment (2024)0.00
- Drspeech: Degradation-robust Text-to-speech Synthesis With Frame-level And Utterance-level Acoustic Representation Learning (2022)7.50
- Residual Speaker Representation For One-shot Voice Conversion (2023)0.00
- Robustness Of Voice Conversion Techniques Under Mismatched Conditions (2016)0.00
- DRVC: A Framework Of Any-to-any Voice Conversion With Self-supervised Learning (2022)9.59
- An Overview Of Voice Conversion And Its Challenges: From Statistical Modeling To Deep Learning (2020)18.53
- Accent And Speaker Disentanglement In Many-to-many Voice Conversion (2020)10.35