Towards Improved Room Impulse Response Estimation For Speech Recognition
2022 Β· Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, et al.
Abstract
We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features, and uses a novel energy decay relief loss to optimize for capturing energy-based properties of the input reverberant speech. We show that our model outperforms the state-of-the-art baselines on acoustic benchmarks (by 17% on the energy decay relief and 22% on an early-reflection energy metric), as well as in an ASR evaluation task (by 6.9% in word error rate).
Authors
(none)
Tags
Stats
Related papers
- IR-GAN: Room Impulse Response Generator For Far-field Speech Recognition (2020)11.93
- TS-RIR: Translated Synthetic Room Impulse Responses For Speech Augmentation (2021)8.35
- RIR-SF: Room Impulse Response Based Spatial Feature For Target Speech Recognition In Multi-channel Multi-speaker Scenarios (2023)0.00
- Rec-rir: Monaural Blind Room Impulse Response Identification Via Dnn-based Reverberant Speech Reconstruction In STFT Domain (2025)3.06
- AV-RIR: Audio-visual Room Impulse Response Estimation (2023)0.00
- Towards Improving Speaker Distance Estimation Through Generative Impulse Response Augmentation (2026)0.00
- Synthetic Wave-geometric Impulse Responses For Improved Speech Dereverberation (2022)0.00
- Improving Reverberant Speech Separation With Multi-stage Training And Curriculum Learning (2021)0.00