Permutation Invariant Training Of Deep Models For Speaker-independent Multi-talker Speech Separation
2016 · Dong Yu, Morten Kolbæk, Zheng-Hua Tan, et al.
Abstract
We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from most of the prior arts that treat speech separation as a multi-class regression problem and the deep clustering technique that considers it a segmentation (or clustering) problem, our model optimizes for the separation regression error, ignoring the order of mixing sources. This strategy cleverly solves the long-lasting label permutation problem that has prevented progress on deep learning based techniques for speech separation. Experiments on the equal-energy mixing setup of a Danish corpus confirms the effectiveness of PIT. We believe improvements built upon PIT can eventually solve the cocktail-party problem and enable real-world adoption of, e.g., automatic meeting transcription and multi-party human-computer interaction, where overlapping speech is common.
Authors
(none)
Tags
Stats
Related papers
- Multi-talker Speech Separation With Utterance-level Permutation Invariant Training Of Deep Recurrent Neural Networks (2017)20.90
- Separating Long-form Speech With Group-wise Permutation Invariant Training (2021)4.52
- Probabilistic Permutation Invariant Training For Speech Separation (2019)7.81
- Single-channel Speech Separation Using Soft-minimum Permutation Invariant Training (2021)2.26
- Interrupted And Cascaded Permutation Invariant Training For Speech Separation (2019)4.52
- Single-channel Multi-talker Speech Recognition With Permutation Invariant Training (2017)12.10
- Recognizing Multi-talker Speech With Permutation Invariant Training (2017)12.81
- Multiple Choice Learning For Efficient Speech Separation With Many Speakers (2024)2.26