Utterance-level Permutation Invariant Training With Latency-controlled BLSTM For Single-channel Multi-talker Speech Separation
2019 Β· Lu Huang, Gaofeng Cheng, Pengyuan Zhang, et al.
Abstract
Utterance-level permutation invariant training (uPIT) has achieved promising progress on single-channel multi-talker speech separation task. Long short-term memory (LSTM) and bidirectional LSTM (BLSTM) are widely used as the separation networks of uPIT, i.e. uPIT-LSTM and uPIT-BLSTM. uPIT-LSTM has lower latency but worse performance, while uPIT-BLSTM has better performance but higher latency. In this paper, we propose using latency-controlled BLSTM (LC-BLSTM) during inference to fulfill low-latency and good-performance speech separation. To find a better training strategy for BLSTM-based separation network, chunk-level PIT (cPIT) and uPIT are compared. The experimental results show that uPIT outperforms cPIT when LC-BLSTM is used during inference. It is also found that the inter-chunk speaker tracing (ST) can further improve the separation performance of uPIT-LC-BLSTM. Evaluated on the WSJ0 two-talker mixed-speech separation task, the absolute gap of signal-to-distortion ratio (SDR) be
Authors
(none)
Tags
Stats
Related papers
- Multi-talker Speech Separation With Utterance-level Permutation Invariant Training Of Deep Recurrent Neural Networks (2017)20.90
- Separating Long-form Speech With Group-wise Permutation Invariant Training (2021)4.52
- Interrupted And Cascaded Permutation Invariant Training For Speech Separation (2019)4.52
- Probabilistic Permutation Invariant Training For Speech Separation (2019)7.81
- Permutation Invariant Training Of Deep Models For Speaker-independent Multi-talker Speech Separation (2016)0.00
- Single-channel Speech Separation Using Soft-minimum Permutation Invariant Training (2021)2.26
- Single-channel Multi-talker Speech Recognition With Permutation Invariant Training (2017)12.10
- Discriminative Learning For Monaural Speech Separation Using Deep Embedding Features (2019)8.60