How Bad Are Artifacts?: Analyzing The Impact Of Speech Enhancement Errors On ASR
2022 Β· Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, et al.
Abstract
It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as the SE error signal that cannot be represented as a linear combination of speech and noise sources. We propose manually scaling the error components to analyze their impact on ASR. We experimentally identify the artifact component as the main cause of performance degradation, and we find that mitigating the artifact can greatly improve ASR performance. Furthermore, we demonstrate that the simple observation adding (OA) technique (i.e., adding a scaled version of the observed signal to the enhanced speech) can monotonically increase the signal-to-artifact ratio under a mild condition. Accordingl
Authors
(none)
Tags
Stats
Related papers
- Rethinking Processing Distortions: Disentangling The Impact Of Speech Enhancement Errors On Speech Recognition Performance (2024)8.35
- How Does End-to-end Speech Recognition Training Impact Speech Enhancement Artifacts? (2023)7.50
- Reducing The Gap Between Pretrained Speech Enhancement And Recognition Models Using A Real Speech-trained Bridging Module (2025)2.26
- Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition (2024)7.50
- Acoustics-guided Evaluation (AGE): A New Measure For Estimating Performance Of Speech Enhancement Algorithms For Robust ASR (2018)0.00
- Effect Of Noise Suppression Losses On Speech Distortion And ASR Performance (2021)10.74
- A Study Of Incorporating Articulatory Movement Information In Speech Enhancement (2020)0.00
- Human Listening And Live Captioning: Multi-task Training For Speech Enhancement (2021)9.92