Partial Rank Similarity Minimization Method For Quality MOS Prediction Of Unseen Speech Synthesis Systems In Zero-shot And Semi-supervised Setting
2023 Β· Hemant Yadav, Erica Cooper, Junichi Yamagishi, et al.
Abstract
This paper introduces a novel objective function for quality mean opinion score (MOS) prediction of unseen speech synthesis systems. The proposed function measures the similarity of relative positions of predicted MOS values, in a mini-batch, rather than the actual MOS values. That is the partial rank similarity is measured (PRS) rather than the individual MOS values as with the L1 loss. Our experiments on out-of-domain speech synthesis systems demonstrate that the PRS outperforms L1 loss in zero-shot and semi-supervised settings, exhibiting stronger correlation with ground truth. These findings highlight the importance of considering rank order, as done by PRS, when training MOS prediction models. We also argue that mean squared error and linear correlation coefficient metrics may be unreliable for evaluating MOS prediction models. In conclusion, PRS-trained models provide a robust framework for evaluating speech quality and offer insights for developing high-quality speech synthesis
Authors
(none)
Tags
Stats
Related papers
- A Comparison Of Deep Learning MOS Predictors For Speech Synthesis Quality (2022)6.34
- Learning To Maximize Speech Quality Directly Using MOS Prediction For Neural Text-to-speech (2020)7.81
- Ldnet: Unified Listener Dependent Modeling In MOS Prediction For Synthetic Speech (2021)12.74
- Attention-based Speech Enhancement Using Human Quality Perception Modelling (2023)0.00
- Uncertainty As A Predictor: Leveraging Self-supervised Learning For Zero-shot MOS Prediction (2023)6.34
- LE-SSL-MOS: Self-supervised Learning MOS Prediction With Listener Enhancement (2023)9.23
- DDOS: A MOS Prediction Framework Utilizing Domain Adaptive Pre-training And Distribution Of Opinion Scores (2022)9.03
- Neural MOS Prediction For Synthesized Speech Using Multi-task Learning With Spoofing Detection And Spoofing Type Classification (2020)9.59