MT4SSL: Boosting Self-supervised Speech Representation Learning By Integrating Multiple Targets
2022 Β· Ziyang Ma, Zhisheng Zheng, Changli Tang, et al.
Abstract
In this paper, we provide a new perspective on self-supervised speech models from how the training targets are obtained. We generalize the targets extractor into Offline Targets Extractor (Off-TE) and Online Targets Extractor (On-TE). Based on this, we propose a new multi-tasking learning framework for self-supervised learning, MT4SSL, which stands for Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets. MT4SSL uses the K-means algorithm as an Off-TE and a teacher network without gradients as an On-TE, respectively. Our model outperforms previous SSL methods by nontrivial margins on the LibriSpeech benchmark, and is comparable to or even better than the best-performing models with fewer data. Furthermore, we find that using both Off-TE and On-TE results in better convergence in the pre-training phase. With both effectiveness and efficiency, we think doing multi-task learning on self-supervised speech models from our perspective is a promising trend.
Authors
(none)
Tags
Stats
Related papers
- Target Speech Extraction With Pre-trained Self-supervised Learning Models (2024)9.41
- Pushing The Limits Of Unsupervised Unit Discovery For SSL Speech Representation (2023)6.34
- Self-supervised Learning With Bi-label Masked Speech Prediction For Streaming Multi-talker Speech Recognition (2022)5.24
- Adapting Self-supervised Models To Multi-talker Speech Recognition Using Speaker Embeddings (2022)10.61
- Investigating Self-supervised Learning For Speech Enhancement And Separation (2022)13.44
- Unispeech-sat: Universal Speech Representation Learning With Speaker Aware Pre-training (2021)0.00
- Feature Learning And Ensemble Pre-tasks Based Self-supervised Speech Denoising And Dereverberation (2022)0.00
- Downstream Task Agnostic Speech Enhancement With Self-supervised Representation Loss (2023)6.77