MT2KD: Towards A General-purpose Encoder For Speech, Speaker, And Audio Events
2024 Β· Xiaoyu Yang, Qiujia Li, Chao Zhang, et al.
Abstract
With the advances in deep learning, the performance of end-to-end (E2E) single-task models for speech and audio processing has been constantly improving. However, it is still challenging to build a general-purpose model with high performance on multiple tasks, since different speech and audio processing tasks usually require different training data, input features, or model architectures to achieve optimal performance. In this work, MT2KD, a novel two-stage multi-task learning framework is proposed to build a general-purpose speech and audio encoder that jointly performs three fundamental tasks: automatic speech recognition (ASR), audio tagging (AT) and speaker verification (SV). In the first stage, multi-teacher knowledge distillation (KD) is applied to align the feature spaces of three single-task high-performance teacher encoders into a single student encoder using the same unlabelled data. In the second stage, multi-task supervised fine-tuning is carried out by initialising the mod
Authors
(none)
Tags
Stats
Related papers
- Integrated Multi-level Knowledge Distillation For Enhanced Speaker Verification (2024)0.00
- Speaker Adaptation For End-to-end CTC Models (2019)8.60
- Joint Speaker Encoder And Neural Back-end Model For Fully End-to-end Automatic Speaker Verification With Multiple Enrollment Utterances (2022)0.00
- Inter-kd: Intermediate Knowledge Distillation For Ctc-based Automatic Speech Recognition (2022)7.50
- Knowledge Distillation For Neural Transducer-based Target-speaker ASR: Exploiting Parallel Mixture/single-talker Speech Data (2023)4.52
- Masked Modeling Duo For Speech: Specializing General-purpose Audio Representation To Speech Using Denoising Distillation (2023)7.94
- Tandem Multitask Training Of Speaker Diarisation And Speech Recognition For Meeting Transcription (2022)7.81
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00