Separating Long-form Speech With Group-wise Permutation Invariant Training
2021 Β· Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, et al.
Abstract
Multi-talker conversational speech processing has drawn many interests for various applications such as meeting transcription. Speech separation is often required to handle overlapped speech that is commonly observed in conversation. Although the original utterancelevel permutation invariant training-based continuous speech separation approach has proven to be effective in various conditions, it lacks the ability to leverage the long-span relationship of utterances and is computationally inefficient due to the highly overlapped sliding windows. To overcome these drawbacks, we propose a novel training scheme named Group-PIT, which allows direct training of the speech separation models on the long-form speech with a low computational cost for label assignment. Two different speech separation approaches with Group-PIT are explored, including direct long-span speech separation and short-span speech separation with long-span tracking. The experiments on the simulated meeting-style data demo
Authors
(none)
Tags
Stats
Related papers
- Permutation Invariant Training Of Deep Models For Speaker-independent Multi-talker Speech Separation (2016)0.00
- Interrupted And Cascaded Permutation Invariant Training For Speech Separation (2019)4.52
- Multi-talker Speech Separation With Utterance-level Permutation Invariant Training Of Deep Recurrent Neural Networks (2017)20.90
- Directed Speech Separation For Automatic Speech Recognition Of Long Form Conversational Speech (2021)2.26
- Probabilistic Permutation Invariant Training For Speech Separation (2019)7.81
- Single-channel Speech Separation Using Soft-minimum Permutation Invariant Training (2021)2.26
- Graph-pit: Generalized Permutation Invariant Training For Continuous Separation Of Arbitrary Numbers Of Speakers (2021)8.82
- Utterance-level Permutation Invariant Training With Latency-controlled BLSTM For Single-channel Multi-talker Speech Separation (2019)0.00