Hierarchical Modeling Of Spatial Cues Via Spherical Harmonics For Multi-channel Speech Enhancement
2023 Β· Jiahui Pan, Shulin He, Hui Zhang, et al.
Abstract
Multi-channel speech enhancement utilizes spatial information from multiple microphones to extract the target speech. However, most existing methods do not explicitly model spatial cues, instead relying on implicit learning from multi-channel spectra. To better leverage spatial information, we propose explicitly incorporating spatial modeling by applying spherical harmonic transforms (SHT) to the multi-channel input. In detail, a hierarchical framework is introduced whereby lower order harmonics capturing broader spatial patterns are estimated first, then combined with higher orders to recursively predict finer spatial details. Experiments on TIMIT demonstrate the proposed method can effectively recover target spatial patterns and achieve improved performance over baseline models, using fewer parameters and computations. Explicitly modeling spatial information hierarchically enables more effective multi-channel speech enhancement.
Authors
(none)
Tags
Stats
Related papers
- Efficient Multi-channel Speech Enhancement With Spherical Harmonics Injection For Directional Encoding (2023)3.58
- Multi-geometry Spatial Acoustic Modeling For Distant Speech Recognition (2019)6.34
- Exploring The Potential Of Data-driven Spatial Audio Enhancement Using A Single-channel Model (2024)0.00
- One Model To Enhance Them All: Array Geometry Agnostic Multi-channel Personalized Speech Enhancement (2021)0.00
- Improving Dual-microphone Speech Enhancement By Learning Cross-channel Features With Multi-head Attention (2022)6.77
- Spatial Hubert: Self-supervised Spatial Speech Representation Learning For A Single Talker From Multi-channel Audio (2023)0.00
- Leveraging Joint Spectral And Spatial Learning With MAMBA For Multichannel Speech Enhancement (2024)0.00
- Real-time Stereo Speech Enhancement With Spatial-cue Preservation Based On Dual-path Structure (2024)5.84