Multi-metric Preference Alignment For Generative Speech Restoration
2025 Β· Junan Zhang, Xueyao Zhang, Jing Yang, et al.
Abstract
Recent generative models have significantly advanced speech restoration tasks, yet their training objectives often misalign with human perceptual preferences, resulting in suboptimal quality. While post-training alignment has proven effective in other generative domains like text and image generation, its application to generative speech restoration remains largely under-explored. This work investigates the challenges of applying preference-based post-training to this task, focusing on how to define a robust preference signal and curate high-quality data to avoid reward hacking. To address these challenges, we propose a multi-metric preference alignment strategy. We construct a new dataset, GenSR-Pref, comprising 80K preference pairs, where each chosen sample is unanimously favored by a complementary suite of metrics covering perceptual quality, signal fidelity, content consistency, and timbre preservation. This principled approach ensures a holistic preference signal. Applying Direct
Authors
(none)
Tags
Stats
Related papers
- Aligning Generative Speech Enhancement With Perceptual Feedback (2025)0.00
- Multi-metric Optimization Using Generative Adversarial Networks For Near-end Speech Intelligibility Enhancement (2021)8.60
- Using RLHF To Align Speech Enhancement Approaches To Mean-opinion Quality Scores (2024)0.00
- Multi-cmgan+/+: Leveraging Multi-objective Speech Quality Metric Prediction For Speech Enhancement (2023)0.00
- Investigating Training Objectives For Generative Speech Enhancement (2024)9.76
- Preference-based Training Framework For Automatic Speech Quality Assessment Using Deep Neural Network (2023)5.24
- Towards Generalized Speech Enhancement With Generative Adversarial Networks (2019)10.35
- Analysing Diffusion-based Generative Approaches Versus Discriminative Approaches For Speech Restoration (2022)11.39