← all papers Β· overview

Beyond Noise Suppression: Dynamic Distortion Control Loss for Speech Enhancement and Robust Automatic Speech Recognition

Abstract

In severe noise conditions, employing a speech enhancement (SE) model as a front-end serves as a computationally efficient strategy for robust automatic speech recognition (ASR), offering a practical alternative to the costly fine-tuning of large-scale ASR systems. However, improvements in human perceptual quality do not necessarily guarantee enhanced machine recognition accuracy, as aggressive noise suppression often introduces distortions and artifacts that obscure fine-grained spectral details and increase word error rates (WERs). To mitigate the discrepancy, we present the Dynamic Distortion Control (DDC) loss, a unified training objective designed to bridge the gap between perceptual fidelity and recognition robustness. Integrated into a time-frequency Transformer architecture, the presented loss addresses the distortion-robustness discrepancy. Experimental results on a LibriSpeech dataset corrupted by noise from DNS Challenge demonstrate the effectiveness of the DDC loss on both perceptual quality and recognition accuracy across diverse noise conditions.

Related papers