Combining Tf-gridnet And Mixture Encoder For Continuous Speech Separation For Meeting Transcription
2023 Β· Peter Vieting, Simon Berger, Thilo von Neumann, et al.
Abstract
Many real-life applications of automatic speech recognition (ASR) require processing of overlapped speech. A common method involves first separating the speech into overlap-free streams on which ASR is performed. Recently, TF-GridNet has shown impressive performance in speech separation in real reverberant conditions. Furthermore, a mixture encoder was proposed that leverages the mixed speech to mitigate the effect of separation artifacts. In this work, we extended the mixture encoder from a static two-speaker scenario to a natural meeting context featuring an arbitrary number of speakers and varying degrees of overlap. We further demonstrate its limits by the integration with separators of varying strength including TF-GridNet. Our experiments result in a new state-of-the-art performance on LibriCSS using a single microphone. They show that TF-GridNet largely closes the gap between previous methods and oracle separation independent of mixture encoding. We further investigate the remai
Authors
(none)
Tags
Stats
Related papers
- Tf-gridnet: Integrating Full- And Sub-band Modeling For Speech Separation (2022)0.00
- Meeting Recognition With Continuous Speech Separation And Transcription-supported Diarization (2023)6.77
- Integration Of Speech Separation, Diarization, And Recognition For Multi-speaker Meetings: System Description, Comparison, And Analysis (2020)13.23
- Exploring The Integration Of Speech Separation And Recognition With Self-supervised Learning Representation (2023)6.34
- A Sidecar Separator Can Convert A Single-talker Speech Recognition System To A Multi-talker One (2023)9.03
- Simultaneous Diarization And Separation Of Meetings Through The Integration Of Statistical Mixture Models (2024)0.00
- Transcription-free Fine-tuning Of Speech Separation Models For Noisy And Reverberant Multi-speaker Automatic Speech Recognition (2024)3.58
- Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation (2017)20.16