The Voicemos Challenge 2023: Zero-shot Subjective Speech Quality Prediction For Multiple Domains
2023 Β· Erica Cooper, Wen-Chin Huang, Yu Tsao, et al.
Abstract
We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech. This year, we emphasize real-world and challenging zero-shot out-of-domain MOS prediction with three tracks for three different voice evaluation scenarios. Ten teams from industry and academia in seven different countries participated. Surprisingly, we found that the two sub-tracks of French text-to-speech synthesis had large differences in their predictability, and that singing voice-converted samples were not as difficult to predict as we had expected. Use of diverse datasets and listener information during training appeared to be successful approaches.
Authors
(none)
Tags
Stats
Related papers
- A Comparison Of Deep Learning MOS Predictors For Speech Synthesis Quality (2022)6.34
- Uncertainty As A Predictor: Leveraging Self-supervised Learning For Zero-shot MOS Prediction (2023)6.34
- DDOS: A MOS Prediction Framework Utilizing Domain Adaptive Pre-training And Distribution Of Opinion Scores (2022)9.03
- Singmos: An Extensive Open-source Singing Voice Dataset For MOS Prediction (2024)0.00
- Predictions Of Subjective Ratings And Spoofing Assessments Of Voice Conversion Challenge 2020 Submissions (2020)5.84
- The T05 System For The Voicemos Challenge 2024: Transfer Learning From Deep Image Classifier To Naturalness MOS Prediction Of High-quality Synthetic Speech (2024)0.00
- Ldnet: Unified Listener Dependent Modeling In MOS Prediction For Synthetic Speech (2021)12.74
- Neural MOS Prediction For Synthesized Speech Using Multi-task Learning With Spoofing Detection And Spoofing Type Classification (2020)9.59