Dualsep: A Light-weight Dual-encoder Convolutional Recurrent Network For Real-time In-car Speech Separation
2024 Β· Ziqian Wang, Jiayao Sun, Zihan Zhang, et al.
Abstract
Advancements in deep learning and voice-activated technologies have driven the development of human-vehicle interaction. Distributed microphone arrays are widely used in in-car scenarios because they can accurately capture the voices of passengers from different speech zones. However, the increase in the number of audio channels, coupled with the limited computational resources and low latency requirements of in-car systems, presents challenges for in-car multi-channel speech separation. To migrate the problems, we propose a lightweight framework that cascades digital signal processing (DSP) and neural networks (NN). We utilize fixed beamforming (BF) to reduce computational costs and independent vector analysis (IVA) to provide spatial prior. We employ dual encoders for dual-branch modeling, with spatial encoder capturing spatial cues and spectral encoder preserving spectral information, facilitating spatial-spectral fusion. Our proposed system supports both streaming and non-streaming
Authors
(none)
Tags
Stats
Related papers
- Real-time Speech Enhancement And Separation With A Unified Deep Neural Network For Single/dual Talker Scenarios (2023)2.26
- Embedding Recurrent Layers With Dual-path Strategy In A Variant Of Convolutional Network For Speaker-independent Speech Separation (2022)4.52
- DCF-DS: Deep Cascade Fusion Of Diarization And Separation For Speech Recognition Under Realistic Single-channel Conditions (2024)3.58
- Dbnet: A Dual-branch Network Architecture Processing On Spectrum And Waveform For Single-channel Speech Enhancement (2021)8.09
- Audio-visual Speech Separation And Dereverberation With A Two-stage Multimodal Network (2019)12.47
- Deep Neural Mel-subband Beamformer For In-car Speech Separation (2022)6.77
- Neural Directed Speech Enhancement With Dual Microphone Array In High Noise Scenario (2024)0.00
- Short-time Deep-learning Based Source Separation For Speech Enhancement In Reverberant Environments With Beamforming (2020)0.00