Pitch-and-spectrum-aware Singing Quality Assessment With Bias Correction And Model Fusion
2024 Β· Yu-Fei Shi, Yang Ai, Ye-Xin Lu, et al.
Abstract
We participated in track 2 of the VoiceMOS Challenge 2024, which aimed to predict the mean opinion score (MOS) of singing samples. Our submission secured the first place among all participating teams, excluding the official baseline. In this paper, we further improve our submission and propose a novel Pitch-and-Spectrum-aware Singing Quality Assessment (PS-SQA) method. The PS-SQA is designed based on the self-supervised-learning (SSL) MOS predictor, incorporating singing pitch and spectral information, which are extracted using pitch histogram and non-quantized neural codec, respectively. Additionally, the PS-SQA introduces a bias correction strategy to address prediction biases caused by low-resource training samples, and employs model fusion technology to further enhance prediction accuracy. Experimental results confirm that our proposed PS-SQA significantly outperforms all competing systems across all system-level metrics, confirming its strong sing quality assessment capabilities.
Authors
(none)
Tags
Stats
Related papers
- Singmos-pro: An Comprehensive Benchmark For Singing Quality Assessment (2025)0.00
- Singmos: An Extensive Open-source Singing Voice Dataset For MOS Prediction (2024)0.00
- The Voicemos Challenge 2023: Zero-shot Subjective Speech Quality Prediction For Multiple Domains (2023)11.19
- Uncertainty As A Predictor: Leveraging Self-supervised Learning For Zero-shot MOS Prediction (2023)6.34
- A Comparison Of Deep Learning MOS Predictors For Speech Synthesis Quality (2022)6.34
- Vits-based Singing Voice Conversion Leveraging Whisper And Multi-scale F0 Modeling (2023)0.00
- Adversarially Trained Multi-singer Sequence-to-sequence Singing Synthesizer (2020)7.81
- LE-SSL-MOS: Self-supervised Learning MOS Prediction With Listener Enhancement (2023)9.23