Speaker-independent Speech Separation With Deep Attractor Network
2017 Β· Yi Luo, Zhuo Chen, Nima Mesgarani
Abstract
Despite the recent success of deep learning for many speech processing tasks, single-microphone, speaker-independent speech separation remains challenging for two main reasons. The first reason is the arbitrary order of the target and masker speakers in the mixture permutation problem, and the second is the unknown number of speakers in the mixture output dimension problem. We propose a novel deep learning framework for speech separation that addresses both of these issues. We use a neural network to project the time-frequency representation of the mixture signal into a high-dimensional embedding space. A reference point attractor is created in the embedding space to represent each speaker which is defined as the centroid of the speaker in the embedding space. The time-frequency embeddings of each speaker are then forced to cluster around the corresponding attractor point which is used to determine the time-frequency assignment of the speaker. We propose three methods for finding the a
Authors
(none)
Tags
Stats
Related papers
- Deep Attractor Network For Single-microphone Speaker Separation (2016)17.88
- Cracking The Cocktail Party Problem By Multi-beam Deep Attractor Network (2018)9.92
- Boosting Unknown-number Speaker Separation With Transformer Decoder-based Attractor (2024)0.00
- Deep Ad-hoc Beamforming Based On Speaker Extraction For Target-dependent Speech Separation (2020)7.50
- Exploring The Time-domain Deep Attractor Network With Two-stream Architectures In A Reverberant Environment (2020)7.16
- Single-channel Multi-speaker Separation Using Deep Clustering (2016)0.00
- Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters (2023)10.35
- Discriminative Learning For Monaural Speech Separation Using Deep Embedding Features (2019)8.60