Orthonormal Embedding-based Deep Clustering For Single-channel Speech Separation
2019 Β· Soyeon Choe, Soo-Whan Chung, Youna Ji, et al.
Abstract
Deep clustering is a deep neural network-based speech separation algorithm that first trains the mixed component of signals with high-dimensional embeddings, and then uses a clustering algorithm to separate each mixture of sources. In this paper, we extend the baseline criterion of deep clustering with an additional regularization term to further improve the overall performance. This term plays a role in assigning a condition to the embeddings such that it gives less correlation to each embedding dimension, leading to better decomposition of the spectral bins. The regularization term helps to mitigate the unavoidable permutation problem in the conventional deep clustering method, which enables to bring better clustering through the formation of optimal embeddings. We evaluate the results by varying embedding dimension, signal-to-interference ratio (SIR), and gender dependency. The performance comparison with the source separation measurement metric, i.e. signal-to-distortion ratio (SDR
Authors
(none)
Tags
Stats
Related papers
- Single-channel Multi-speaker Separation Using Deep Clustering (2016)0.00
- Improved Speech Separation With Time-and-frequency Cross-domain Joint Embedding And Clustering (2019)10.74
- Deep Clustering And Conventional Networks For Music Separation: Stronger Together (2016)14.76
- Spatial And Spectral Deep Attention Fusion For Multi-channel Speech Separation Using Deep Embedding Features (2020)0.00
- Multi-channel Speech Separation Using Deep Embedding Model With Multilayer Bootstrap Networks (2019)0.00
- Discriminative Learning For Monaural Speech Separation Using Deep Embedding Features (2019)8.60
- Single-channel Speech Separation With Auxiliary Speaker Embeddings (2019)0.00
- Low-latency Deep Clustering For Speech Separation (2019)8.09