Single-channel Speech Separation Using Soft-minimum Permutation Invariant Training
2021 Β· Midia Yousefi, John H. L. Hansen
Abstract
The goal of speech separation is to extract multiple speech sources from a single microphone recording. Recently, with the advancement of deep learning and availability of large datasets, speech separation has been formulated as a supervised learning problem. These approaches aim to learn discriminative patterns of speech, speakers, and background noise using a supervised learning algorithm, typically a deep neural network. A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal, referred to as label permutation ambiguity. Permutation ambiguity refers to the problem of determining the output-label assignment between the separated sources and the available single-speaker speech labels. Finding the best output-label assignment is required for calculation of separation error, which is later used for updating parameters of the model. Recently, Permutation Invariant Training (PIT) has been shown to be a promising solution in handl
Authors
(none)
Tags
Stats
Related papers
- Probabilistic Permutation Invariant Training For Speech Separation (2019)7.81
- Interrupted And Cascaded Permutation Invariant Training For Speech Separation (2019)4.52
- Permutation Invariant Training Of Deep Models For Speaker-independent Multi-talker Speech Separation (2016)0.00
- Single-channel Multi-talker Speech Recognition With Permutation Invariant Training (2017)12.10
- Multi-channel Narrow-band Deep Speech Separation With Full-band Permutation Invariant Training (2021)9.41
- Separating Long-form Speech With Group-wise Permutation Invariant Training (2021)4.52
- Stabilizing Label Assignment For Speech Separation By Self-supervised Pre-training (2020)4.52
- Recognizing Multi-talker Speech With Permutation Invariant Training (2017)12.81