Towards Improving Speaker Distance Estimation Through Generative Impulse Response Augmentation
2026 Β· Anton Ratnarajah, Mehmet Ergezer, Arun Nair, et al.
Abstract
arXiv:2605.00721v1 Announce Type: new Abstract: The Room Acoustics and Speaker Distance Estimation (SDE) Challenge at ICASSP 2025 explores the effectiveness of augmented room impulse response (RIR) data for improving SDE model performance. This challenge at GenDARA involves generating RIRs to supplement sparse datasets and fine-tuning SDE models with the augmented data. We employ the open-source fast diffuse room impulse response generator (FastRIR) conditioned only on speaker and listener locations. We design a quality filter to ensure generated RIR alignment with challenge RIRs, and hyperparameter optimization is employed for model fine-tuning. Our approach reduces the mean absolute error (MAE) of the five positions from 1.66m to 0.6m for GWA rooms and from 2.18m to 0.69m for Treble rooms, with results demonstrating that the augmentation approach significantly improves estimation accuracy, particularly at medium to long distances.
Authors
(none)
Tags
Stats
Related papers
- Towards Improved Room Impulse Response Estimation For Speech Recognition (2022)10.61
- IR-GAN: Room Impulse Response Generator For Far-field Speech Recognition (2020)11.93
- TS-RIR: Translated Synthetic Room Impulse Responses For Speech Augmentation (2021)8.35
- Synthetic Wave-geometric Impulse Responses For Improved Speech Dereverberation (2022)0.00
- RIR-SF: Room Impulse Response Based Spatial Feature For Target Speech Recognition In Multi-channel Multi-speaker Scenarios (2023)0.00
- Mmaudioreverbs: Video-guided Acoustic Modeling For Dereverberation And Room Impulse Response Estimation (2026)0.00
- Improving Reverberant Speech Separation With Multi-stage Training And Curriculum Learning (2021)0.00
- AV-RIR: Audio-visual Room Impulse Response Estimation (2023)0.00