Continuous Speech Separation Using Speaker Inventory For Long Multi-talker Recording
2020 Β· Cong Han, Yi Luo, Chenda Li, et al.
Abstract
Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years. Recent research includes extracting target speech by using the target speaker's voice snippet and jointly separating all participating speakers by using a pool of additional speaker signals, which is known as speech separation using speaker inventory (SSUSI). However, all these systems ideally assume that the pre-enrolled speaker signals are available and are only evaluated on simple data configurations. In realistic multi-talker conversations, the speech signal contains a large proportion of non-overlapped regions, where we can derive robust speaker embedding of individual talkers. In this work, we adopt the SSUSI model in long recordings and propose a self-informed, clustering-based inventory forming scheme for long recording, where the speaker inventory is fully built from the input signal without the need for external speaker signals. Experiment results on si
Authors
(none)
Tags
Stats
Related papers
- Speaker Separation Using Speaker Inventories And Estimated Speech (2020)6.34
- Directed Speech Separation For Automatic Speech Recognition Of Long Form Conversational Speech (2021)2.26
- Separating Long-form Speech With Group-wise Permutation Invariant Training (2021)4.52
- Simultaneous Speech Extraction For Multiple Target Speakers Under The Meeting Scenarios (2022)2.26
- Low-latency Speaker-independent Continuous Speech Separation (2019)9.23
- Single-channel Multi-speaker Separation Using Deep Clustering (2016)0.00
- Adapting Self-supervised Models To Multi-talker Speech Recognition Using Speaker Embeddings (2022)10.61
- Low-latency Deep Clustering For Speech Separation (2019)8.09