Multi-channel Speech Separation Using Deep Embedding Model With Multilayer Bootstrap Networks
2019 Β· Ziye Yang, Xiao-Lei Zhang
Abstract
Recently, deep clustering (DPCL) based speaker-independent speech separation has drawn much attention, since it needs little speaker prior information. However, it still has much room of improvement, particularly in reverberant environments. If the training and test environments mismatch which is a common case, the embedding vectors produced by DPCL may contain much noise and many small variations. To deal with the problem, we propose a variant of DPCL, named DPCL++, by applying a recent unsupervised deep learning method---multilayer bootstrap networks(MBN)---to further reduce the noise and small variations of the embedding vectors in an unsupervised way in the test stage, which fascinates k-means to produce a good result. MBN builds a gradually narrowed network from bottom-up via a stack of k-centroids clustering ensembles, where the k-centroids clusterings are trained independently by random sampling and one-nearest-neighbor optimization. To further improve the robustness of DPCL++ i
Authors
(none)
Tags
Stats
Related papers
- Learning Deep Representations By Multilayer Bootstrap Networks For Speaker Diarization (2019)0.00
- Single-channel Multi-speaker Separation Using Deep Clustering (2016)0.00
- Discriminative Learning For Monaural Speech Separation Using Deep Embedding Features (2019)8.60
- Orthonormal Embedding-based Deep Clustering For Single-channel Speech Separation (2019)0.00
- Spatial And Spectral Deep Attention Fusion For Multi-channel Speech Separation Using Deep Embedding Features (2020)0.00
- Speaker-independent Speech Separation With Deep Attractor Network (2017)16.84
- Analysis Of Deep Clustering As Preprocessing For Automatic Speech Recognition Of Sparsely Overlapping Speech (2019)9.59
- Improved Speech Separation With Time-and-frequency Cross-domain Joint Embedding And Clustering (2019)10.74