Probabilistic Permutation Invariant Training For Speech Separation
2019 Β· Midia Yousefi, Soheil Khorram, John H. L. Hansen
Abstract
Single-microphone, speaker-independent speech separation is normally performed through two steps: (i) separating the specific speech sources, and (ii) determining the best output-label assignment to find the separation error. The second step is the main obstacle in training neural networks for speech separation. Recently proposed Permutation Invariant Training (PIT) addresses this problem by determining the output-label assignment which minimizes the separation error. In this study, we show that a major drawback of this technique is the overconfident choice of the output-label assignment, especially in the initial steps of training when the network generates unreliable outputs. To solve this problem, we propose Probabilistic PIT (Prob-PIT) which considers the output-label permutation as a discrete latent random variable with a uniform prior distribution. Prob-PIT defines a log-likelihood function based on the prior distributions and the separation errors of all permutations; it trains
Authors
(none)
Tags
Stats
Related papers
- Single-channel Speech Separation Using Soft-minimum Permutation Invariant Training (2021)2.26
- Interrupted And Cascaded Permutation Invariant Training For Speech Separation (2019)4.52
- Permutation Invariant Training Of Deep Models For Speaker-independent Multi-talker Speech Separation (2016)0.00
- Multi-talker Speech Separation With Utterance-level Permutation Invariant Training Of Deep Recurrent Neural Networks (2017)20.90
- Separating Long-form Speech With Group-wise Permutation Invariant Training (2021)4.52
- Single-channel Multi-talker Speech Recognition With Permutation Invariant Training (2017)12.10
- Graph-pit: Generalized Permutation Invariant Training For Continuous Separation Of Arbitrary Numbers Of Speakers (2021)8.82
- Stabilizing Label Assignment For Speech Separation By Self-supervised Pre-training (2020)4.52