Implicit Filter-and-sum Network For Multi-channel Speech Separation
2020 Β· Yi Luo, Nima Mesgarani
Abstract
Various neural network architectures have been proposed in recent years for the task of multi-channel speech separation. Among them, the filter-and-sum network (FaSNet) performs end-to-end time-domain filter-and-sum beamforming and has shown effective in both ad-hoc and fixed microphone array geometries. In this paper, we investigate multiple ways to improve the performance of FaSNet. From the problem formulation perspective, we change the explicit time-domain filter-and-sum operation which involves all the microphones into an implicit filter-and-sum operation in the latent space of only the reference microphone. The filter-and-sum operation is applied on a context around the frame to be separated. This allows the problem formulation to better match the objective of end-to-end separation. From the feature extraction perspective, we modify the calculation of sample-level normalized cross correlation (NCC) features into feature-level NCC (fNCC) features. This makes the model better match
Authors
(none)
Tags
Stats
Related papers
- Fasnet: Low-latency Adaptive Beamforming For Multi-microphone Audio Processing (2019)0.00
- Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters (2023)10.35
- End-to-end Networks For Supervised Single-channel Speech Separation (2018)0.00
- Multi-scale Feature Fusion Transformer Network For End-to-end Single Channel Speech Separation (2022)0.00
- Spatialnet: Extensively Learning Spatial Information For Multichannel Joint Speech Separation, Denoising And Dereverberation (2023)13.88
- Efficient Integration Of Multi-channel Information For Speaker-independent Speech Separation (2020)0.00
- Dual-path Filter Network: Speaker-aware Modeling For Speech Separation (2021)3.58
- Temporal-spatial Neural Filter: Direction Informed End-to-end Multi-channel Target Speech Separation (2020)0.00