Voice Conversion Challenge 2020: Intra-lingual Semi-parallel And Cross-lingual Voice Conversion
2020 Β· Yi Zhao, Wen-Chin Huang, Xiaohai Tian, et al.
Abstract
The voice conversion challenge is a bi-annual scientific event held to compare and understand different voice conversion (VC) systems built on a common dataset. In 2020, we organized the third edition of the challenge and constructed and distributed a new database for two tasks, intra-lingual semi-parallel and cross-lingual VC. After a two-month challenge period, we received 33 submissions, including 3 baselines built on the database. From the results of crowd-sourced listening tests, we observed that VC methods have progressed rapidly thanks to advanced deep learning methods. In particular, speaker similarity scores of several systems turned out to be as high as target speakers in the intra-lingual semi-parallel VC task. However, we confirmed that none of them have achieved human-level naturalness yet for the same task. The cross-lingual conversion task is, as expected, a more difficult task, and the overall naturalness and similarity scores were lower than those for the intra-lingual
Authors
(none)
Tags
Stats
Related papers
- The Voice Conversion Challenge 2018: Promoting Development Of Parallel And Nonparallel Methods (2018)17.06
- Predictions Of Subjective Ratings And Spoofing Assessments Of Voice Conversion Challenge 2020 Submissions (2020)5.84
- The Academia Sinica Systems Of Voice Conversion For VCC2020 (2020)3.58
- Baseline System Of Voice Conversion Challenge 2020 With Cyclic Variational Autoencoder And Parallel Wavegan (2020)4.24
- Transfer Learning From Monolingual ASR To Transcription-free Cross-lingual Voice Conversion (2020)0.00
- The Neteasegames System For Voice Conversion Challenge 2020 With Vector-quantization Variational Autoencoder And Wavenet (2020)0.00
- A Comparative Study Of Voice Conversion Models With Large-scale Speech And Singing Data: The T13 Systems For The Singing Voice Conversion Challenge 2023 (2023)6.77
- Building Bilingual And Code-switched Voice Conversion With Limited Training Data Using Embedding Consistency Loss (2021)0.00