TS-RIR: Translated Synthetic Room Impulse Responses For Speech Augmentation
2021 Β· Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha
Abstract
We present a method for improving the quality of synthetic room impulse responses for far-field speech recognition. We bridge the gap between the fidelity of synthetic room impulse responses (RIRs) and the real room impulse responses using our novel, TS-RIRGAN architecture. Given a synthetic RIR in the form of raw audio, we use TS-RIRGAN to translate it into a real RIR. We also perform real-world sub-band room equalization on the translated synthetic RIR. Our overall approach improves the quality of synthetic RIRs by compensating low-frequency wave effects, similar to those in real RIRs. We evaluate the performance of improved synthetic RIRs on a far-field speech dataset augmented by convolving the LibriSpeech clean speech dataset [1] with RIRs and adding background noise. We show that far-field speech augmented using our improved synthetic RIRs reduces the word error rate by up to 19.9% in Kaldi far-field automatic speech recognition benchmark [2].
Authors
(none)
Tags
Stats
Related papers
- IR-GAN: Room Impulse Response Generator For Far-field Speech Recognition (2020)11.93
- Towards Improved Room Impulse Response Estimation For Speech Recognition (2022)10.61
- RIR-SF: Room Impulse Response Based Spatial Feature For Target Speech Recognition In Multi-channel Multi-speaker Scenarios (2023)0.00
- Synthetic Wave-geometric Impulse Responses For Improved Speech Dereverberation (2022)0.00
- Rec-rir: Monaural Blind Room Impulse Response Identification Via Dnn-based Reverberant Speech Reconstruction In STFT Domain (2025)3.06
- Towards Improving Speaker Distance Estimation Through Generative Impulse Response Augmentation (2026)0.00
- AV-RIR: Audio-visual Room Impulse Response Estimation (2023)0.00
- Improving Reverberant Speech Separation With Multi-stage Training And Curriculum Learning (2021)0.00