Blind Estimation Of Sub-band Acoustic Parameters From Ambisonics Recordings Using Spectro-spatial Covariance Features
2024 Β· Hanyu Meng, Jeroen Breebaart, Jeremy Stoddard, et al.
Abstract
Estimating frequency-varying acoustic parameters is essential for enhancing immersive perception in realistic spatial audio creation. In this paper, we propose a unified framework that blindly estimates reverberation time (T60), direct-to-reverberant ratio (DRR), and clarity (C50) across 10 frequency bands using first-order Ambisonics (FOA) speech recordings as inputs. The proposed framework utilizes a novel feature named Spectro-Spatial Covariance Vector (SSCV), efficiently representing temporal, spectral as well as spatial information of the FOA signal. Our models significantly outperform existing single-channel methods with only spectral information, reducing estimation errors by more than half for all three acoustic parameters. Additionally, we introduce FOA-Conv3D, a novel back-end network for effectively utilising the SSCV feature with a 3D convolutional encoder. FOA-Conv3D outperforms the convolutional neural network (CNN) and recurrent convolutional neural network (CRNN) backen
Authors
(none)
Tags
Stats
Related papers
- Blind Acoustic Parameter Estimation Through Task-agnostic Embeddings Using Latent Approximations (2024)6.34
- Direction Of Arrival Estimation Of Noisy Speech Using Convolutional Recurrent Neural Networks With Higher-order Ambisonics Signals (2021)8.82
- 3-D Feature And Acoustic Modeling For Far-field Speech Recognition (2019)0.00
- End-to-end Multi-channel Speaker Extraction And Binaural Speech Synthesis (2024)0.00
- RIR-SF: Room Impulse Response Based Spatial Feature For Target Speech Recognition In Multi-channel Multi-speaker Scenarios (2023)0.00
- Diff-sage: End-to-end Spatial Audio Generation Using Diffusion Models (2024)6.34
- Frequency Domain Multi-channel Acoustic Modeling For Distant Speech Recognition (2019)9.92
- Multi-geometry Spatial Acoustic Modeling For Distant Speech Recognition (2019)6.34