Multi-task Pseudo-label Learning For Non-intrusive Speech Quality Assessment Model
2023 Β· Ryandhimas E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, et al.
Abstract
This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. MPL consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (G-MOS), are the assessment targets. The pretrained MOSA-Net model is utilized to estimate three pseudo labels: perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI). Multi-task learning is then employed to train MTQ-Net by combining a supervised loss (derived from the difference between the estimated score and the ground-truth label) and a semi-supervised loss (derived from the difference between the estimated score and the pseudo label), where the Huber loss is employed as the loss function. Experimental results first demonstrate the advantages of MPL compared to training a model from scra
Authors
(none)
Tags
Stats
Related papers
- Metricnet: Towards Improved Modeling For Non-intrusive Speech Quality Assessment (2021)0.00
- Non-intrusive Speech Quality Assessment Using Neural Networks (2019)13.74
- Inqss: A Speech Intelligibility And Quality Assessment Model Using A Multi-task Learning Network (2021)9.76
- More For Less: Non-intrusive Speech Quality Assessment With Limited Annotations (2021)7.16
- Quality-net: An End-to-end Non-intrusive Speech Quality Assessment Model Based On BLSTM (2018)15.62
- Intermpl: Momentum Pseudo-labeling With Intermediate CTC Loss (2022)0.00
- Attention-based Speech Enhancement Using Human Quality Perception Modelling (2023)0.00
- Neural MOS Prediction For Synthesized Speech Using Multi-task Learning With Spoofing Detection And Spoofing Type Classification (2020)9.59