SE Territory: Monaural Speech Enhancement Meets The Fixed Virtual Perceptual Space Mapping
2023 Β· Xinmeng Xu, Yuhong Yang, Weiping Tu
Abstract
Monaural speech enhancement has achieved remarkable progress recently. However, its performance has been constrained by the limited spatial cues available at a single microphone. To overcome this limitation, we introduce a strategy to map monaural speech into a fixed simulation space for better differentiation between target speech and noise. Concretely, we propose SE-TerrNet, a novel monaural speech enhancement model featuring a virtual binaural speech mapping network via a two-stage multi-task learning framework. In the first stage, monaural noisy input is projected into a virtual space using supervised speech mapping blocks, creating binaural representations. These blocks synthesize binaural noisy speech from monaural input via an ideal binaural room impulse response. The synthesized output assigns speech and noise sources to fixed directions within the perceptual space. In the second stage, the obtained binaural features from the first stage are aggregated. This aggregation aims to
Authors
(none)
Tags
Stats
Related papers
- Injecting Spatial Information For Monaural Speech Enhancement Via Knowledge Distillation (2022)0.00
- End-to-end Multi-channel Speaker Extraction And Binaural Speech Synthesis (2024)0.00
- Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement (2022)0.00
- Real-time Stereo Speech Enhancement With Spatial-cue Preservation Based On Dual-path Structure (2024)5.84
- Spatialnet: Extensively Learning Spatial Information For Multichannel Joint Speech Separation, Denoising And Dereverberation (2023)13.88
- Speech Enhancement With Perceptually-motivated Optimization And Dual Transformations (2022)0.00
- Vsanet: Real-time Speech Enhancement Based On Voice Activity Detection And Causal Spatial Attention (2023)5.24
- Human Listening And Live Captioning: Multi-task Training For Speech Enhancement (2021)9.92