Deep Attractor Network For Single-microphone Speaker Separation
2016 Β· Zhuo Chen, Yi Luo, Nima Mesgarani
Abstract
Despite the overwhelming success of deep learning in various speech processing tasks, the problem of separating simultaneous speakers in a mixture remains challenging. Two major difficulties in such systems are the arbitrary source permutation and unknown number of sources in the mixture. We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source. Attractor points in this study are created by finding the centroids of the sources in the embedding space, which are subsequently used to determine the similarity of each bin in the mixture to each source. The network is then trained to minimize the reconstruction error of each source by optimizing the embeddings. The proposed model is different from prior works in that it implements an end-to-end training, and it does not depend on the number of sources in the m
Authors
(none)
Tags
Stats
Related papers
- Speaker-independent Speech Separation With Deep Attractor Network (2017)16.84
- Cracking The Cocktail Party Problem By Multi-beam Deep Attractor Network (2018)9.92
- Single-channel Multi-speaker Separation Using Deep Clustering (2016)0.00
- Deep Ad-hoc Beamforming Based On Speaker Extraction For Target-dependent Speech Separation (2020)7.50
- Boosting Unknown-number Speaker Separation With Transformer Decoder-based Attractor (2024)0.00
- End-to-end Networks For Supervised Single-channel Speech Separation (2018)0.00
- Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters (2023)10.35
- Discriminative Learning For Monaural Speech Separation Using Deep Embedding Features (2019)8.60