Single-channel Multi-speaker Separation Using Deep Clustering
2016 Β· Yusuf Isik, Jonathan Le Roux, Zhuo Chen, et al.
Abstract
Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering. It was recently applied to spectrogram segmentation, resulting in impressive results on speaker-independent multi-speaker separation. In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation. We first significantly improve upon the baseline system performance by incorporating better regularization, larger temporal context, and a deeper architecture, culminating in an overall improvement in signal to distortion ratio (SDR) of 10.3 dB compared to the baseline of 6.0 dB for two-speaker separation, as well as a 7.1 dB SDR improvement for three-speaker separation. We then extend the model to incorporate an enhancement layer to refine the signal estimates, and perform end-to-end training through both the clustering and enhancement stages to maximi
Authors
(none)
Tags
Stats
Related papers
- Orthonormal Embedding-based Deep Clustering For Single-channel Speech Separation (2019)0.00
- Deep Clustering And Conventional Networks For Music Separation: Stronger Together (2016)14.76
- Spatial And Spectral Deep Attention Fusion For Multi-channel Speech Separation Using Deep Embedding Features (2020)0.00
- Multi-channel Speech Separation Using Deep Embedding Model With Multilayer Bootstrap Networks (2019)0.00
- Deep Attractor Network For Single-microphone Speaker Separation (2016)17.88
- Low-latency Deep Clustering For Speech Separation (2019)8.09
- Improved Speech Separation With Time-and-frequency Cross-domain Joint Embedding And Clustering (2019)10.74
- Efficient Integration Of Multi-channel Information For Speaker-independent Speech Separation (2020)0.00