Using RLHF To Align Speech Enhancement Approaches To Mean-opinion Quality Scores
2024 Β· Anurag Kumar, Andrew Perrault, Donald S. Williamson
Abstract
Objective speech quality measures are typically used to assess speech enhancement algorithms, but it has been shown that they are sub-optimal as learning objectives because they do not always align well with human subjective ratings. This misalignment often results in noticeable distortions and artifacts that cause speech enhancement to be ineffective. To address these issues, we propose a reinforcement learning from human feedback (RLHF) framework to fine-tune an existing speech enhancement approach by optimizing performance using a mean-opinion score (MOS)-based reward model. Our results show that the RLHF-finetuned model has the best performance across different benchmarks for both objective and MOS-based speech quality assessment metrics on the Voicebank+DEMAND dataset. Through ablation studies, we show that both policy gradient loss and supervised MSE loss are important for balanced optimization across the different metrics.
Authors
(none)
Tags
Stats
Related papers
- Attention-based Speech Enhancement Using Human Quality Perception Modelling (2023)0.00
- Reinforcement Learning Based Speech Enhancement For Robust Speech Recognition (2018)11.08
- Aligning Generative Speech Enhancement With Perceptual Feedback (2025)0.00
- Multi-cmgan+/+: Leveraging Multi-objective Speech Quality Metric Prediction For Speech Enhancement (2023)0.00
- Speech Recognition With Llms Adapted To Disordered Speech Using Reinforcement Learning (2024)5.24
- Perceive And Predict: Self-supervised Speech Representation Based Loss Functions For Speech Enhancement (2023)7.16
- Ldnet: Unified Listener Dependent Modeling In MOS Prediction For Synthetic Speech (2021)12.74
- DLPO: Diffusion Model Loss-guided Reinforcement Learning For Fine-tuning Text-to-speech Diffusion Models (2024)0.00