Robust Speech Recognition With Schr\"odinger Bridge-based Speech Enhancement
2025 Β· Rauf Nasretdinov, Roman Korostik, Ante JukiΔ
Abstract
In this work, we investigate application of generative speech enhancement to improve the robustness of ASR models in noisy and reverberant conditions. We employ a recently-proposed speech enhancement model based on Schr\"odinger bridge, which has been shown to perform well compared to diffusion-based approaches. We analyze the impact of model scaling and different sampling methods on the ASR performance. Furthermore, we compare the considered model with predictive and diffusion-based baselines and analyze the speech recognition performance when using different pre-trained ASR models. The proposed approach significantly reduces the word error rate, reducing it by approximately 40% relative to the unprocessed speech signals and by approximately 8% relative to a similarly sized predictive approach.
Authors
(none)
Tags
Stats
Related papers
- Diffusion-based Speech Enhancement With Schr\"odinger Bridge And Symmetric Noise Schedule (2024)0.00
- Investigating Training Objectives For Generative Speech Enhancement (2024)9.76
- FNSE-SBGAN: Far-field Speech Enhancement With Schrodinger Bridge And Generative Adversarial Networks (2025)3.58
- Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition (2024)7.50
- Schrodinger Bridges Beat Diffusion Models On Text-to-speech Synthesis (2023)0.00
- Speech Enhancement And Dereverberation With Diffusion-based Generative Models (2022)23.51
- Schr\"odinger Bridge Mamba For One-step Speech Enhancement (2025)0.00
- Storm: A Diffusion-based Stochastic Regeneration Model For Speech Enhancement And Dereverberation (2022)15.43