Tandem Multitask Training Of Speaker Diarisation And Speech Recognition For Meeting Transcription
2022 Β· Xianrui Zheng, Chao Zhang, Philip C. Woodland
Abstract
Self-supervised-learning-based pre-trained models for speech data, such as Wav2Vec 2.0 (W2V2), have become the backbone of many speech tasks. In this paper, to achieve speaker diarisation and speech recognition using a single model, a tandem multitask training (TMT) method is proposed to fine-tune W2V2. For speaker diarisation, the tasks of voice activity detection (VAD) and speaker classification (SC) are required, and connectionist temporal classification (CTC) is used for ASR. The multitask framework implements VAD, SC, and ASR using an early layer, middle layer, and late layer of W2V2, which coincides with the order of segmenting the audio with VAD, clustering the segments based on speaker embeddings, and transcribing each segment with ASR. Experimental results on the augmented multi-party (AMI) dataset showed that using different W2V2 layers for VAD, SC, and ASR from the earlier to later layers for TMT not only saves computational cost, but also reduces diarisation error rates (DE
Authors
(none)
Tags
Stats
Related papers
- Incorporating VAD Into ASR System By Multi-task Learning (2021)4.52
- Joint Training Or Not: An Exploration Of Pre-trained Speech Models In Audio-visual Speaker Diarization (2023)0.00
- Simultaneous Speech Recognition And Speaker Diarization For Monaural Dialogue Recordings With Target-speaker Acoustic Models (2019)0.00
- The Ustc-ximalaya System For The ICASSP 2022 Multi-channel Multi-party Meeting Transcription (m2met) Challenge (2022)6.34
- One Model To Rule Them All ? Towards End-to-end Joint Speaker Diarization And Speech Recognition (2023)9.59
- Unified Modeling Of Multi-talker Overlapped Speech Recognition And Diarization With A Sidecar Separator (2023)7.50
- TS-SEP: Joint Diarization And Separation Conditioned On Estimated Speaker Embeddings (2023)10.35
- Cross-channel Attention-based Target Speaker Voice Activity Detection: Experimental Results For M2met Challenge (2022)10.07