DDOS: A MOS Prediction Framework Utilizing Domain Adaptive Pre-training And Distribution Of Opinion Scores
2022 Β· Wei-Cheng Tseng, Wei-Tsung Kao, Hung-Yi Lee
Abstract
Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score.
Authors
(none)
Tags
Stats
Related papers
- Ldnet: Unified Listener Dependent Modeling In MOS Prediction For Synthetic Speech (2021)12.74
- Neural MOS Prediction For Synthesized Speech Using Multi-task Learning With Spoofing Detection And Spoofing Type Classification (2020)9.59
- SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations And Acoustic Features (2024)2.26
- The Voicemos Challenge 2023: Zero-shot Subjective Speech Quality Prediction For Multiple Domains (2023)11.19
- LE-SSL-MOS: Self-supervised Learning MOS Prediction With Listener Enhancement (2023)9.23
- A Comparison Of Deep Learning MOS Predictors For Speech Synthesis Quality (2022)6.34
- RAMP: Retrieval-augmented MOS Prediction Via Confidence-based Dynamic Weighting (2023)9.03
- MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction (2024)3.58