Attention-based Speech Enhancement Using Human Quality Perception Modelling
2023 Β· Khandokar Md. Nayem, Donald S. Williamson
Abstract
Perceptually-inspired objective functions such as the perceptual evaluation of speech quality (PESQ), signal-to-distortion ratio (SDR), and short-time objective intelligibility (STOI), have recently been used to optimize performance of deep-learning-based speech enhancement algorithms. These objective functions, however, do not always strongly correlate with a listener's assessment of perceptual quality, so optimizing with these measures often results in poorer performance in real-world scenarios. In this work, we propose an attention-based enhancement approach that uses learned speech embedding vectors from a mean-opinion score (MOS) prediction model and a speech enhancement module to jointly enhance noisy speech. The MOS prediction model estimates the perceptual MOS of speech quality, as assessed by human listeners, directly from the audio signal. The enhancement module also employs a quantized language model that enforces spectral constraints for better speech realism and performanc
Authors
(none)
Tags
Stats
Related papers
- Attentivemos: A Lightweight Attention-only Model For Speech Quality Prediction (2024)3.58
- Aligning Generative Speech Enhancement With Perceptual Feedback (2025)0.00
- Non-intrusive Speech Quality Assessment Using Neural Networks (2019)13.74
- Learning To Maximize Speech Quality Directly Using MOS Prediction For Neural Text-to-speech (2020)7.81
- Using RLHF To Align Speech Enhancement Approaches To Mean-opinion Quality Scores (2024)0.00
- Ldnet: Unified Listener Dependent Modeling In MOS Prediction For Synthetic Speech (2021)12.74
- Perceive And Predict: Self-supervised Speech Representation Based Loss Functions For Speech Enhancement (2023)7.16
- Multi-cmgan+/+: Leveraging Multi-objective Speech Quality Metric Prediction For Speech Enhancement (2023)0.00