Fasnet: Low-latency Adaptive Beamforming For Multi-microphone Audio Processing
2019 Β· Yi Luo, Enea Ceolini, Cong Han, et al.
Abstract
Beamforming has been extensively investigated for multi-channel audio processing tasks. Recently, learning-based beamforming methods, sometimes called \textit\{neural beamformers\}, have achieved significant improvements in both signal quality (e.g. signal-to-noise ratio (SNR)) and speech recognition (e.g. word error rate (WER)). Such systems are generally non-causal and require a large context for robust estimation of inter-channel features, which is impractical in applications requiring low-latency responses. In this paper, we propose filter-and-sum network (FaSNet), a time-domain, filter-based beamforming approach suitable for low-latency scenarios. FaSNet has a two-stage system design that first learns frame-level time-domain adaptive beamforming filters for a selected reference channel, and then calculate the filters for all remaining channels. The filtered outputs at all channels are summed to generate the final output. Experiments show that despite its small model size, FaSNet i
Authors
(none)
Tags
Stats
Related papers
- Implicit Filter-and-sum Network For Multi-channel Speech Separation (2020)8.35
- Deep Long Short-term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition (2017)13.23
- Speaker Adapted Beamforming For Multi-channel Automatic Speech Recognition (2018)5.84
- Deep Ad-hoc Beamforming (2018)9.59
- Embedding And Beamforming: All-neural Causal Beamformer For Multichannel Speech Enhancement (2021)13.05
- A Unified Multichannel Far-field Speech Recognition System: Combining Neural Beamforming With Attention Based End-to-end Model (2024)0.00
- Dnn-free Low-latency Adaptive Speech Enhancement Based On Frame-online Beamforming Powered By Block-online Fastmnmf (2022)0.00
- Sequential Multi-frame Neural Beamforming For Speech Separation And Enhancement (2019)0.00