Mcr-data2vec 2.0: Improving Self-supervised Speech Pre-training Via Model-level Consistency Regularization
2023 Β· Ji Won Yoon, Seok Min Kim, Nam Soo Kim
Abstract
Self-supervised learning (SSL) has shown significant progress in speech processing tasks. However, despite the intrinsic randomness in the Transformer structure, such as dropout variants and layer-drop, improving the model-level consistency remains under-explored in the speech SSL literature. To address this, we propose a new pre-training method that uses consistency regularization to improve Data2vec 2.0, the recent state-of-the-art (SOTA) SSL model. Specifically, the proposed method involves sampling two different student sub-models within the Data2vec 2.0 framework, enabling two output variants derived from a single input without additional parameters. Subsequently, we regularize the outputs from the student sub-models to be consistent and require them to predict the representation of the teacher model. Our experimental results demonstrate that the proposed approach improves the SSL model's robustness and generalization ability, resulting in SOTA results on the SUPERB benchmark.
Authors
(none)
Tags
Stats
Related papers
- Multi-variant Consistency Based Self-supervised Learning For Robust Automatic Speech Recognition (2021)0.00
- Robust Data2vec: Noise-robust Speech Representation Learning For ASR By Combining Regression And Improved Contrastive Learning (2022)9.76
- An Adapter Based Pre-training For Efficient And Scalable Self-supervised Speech Representation Learning (2021)8.35
- Exploiting Consistency-preserving Loss And Perceptual Contrast Stretching To Boost Ssl-based Speech Enhancement (2024)6.77
- Data2vec-aqc: Search For The Right Teaching Assistant In The Teacher-student Training Setup (2022)5.87
- Mixup-breakdown: A Consistency Training Method For Improving Generalization Of Speech Separation Models (2019)0.00
- Weakly-supervised Speech Pre-training: A Case Study On Target Speech Recognition (2023)8.09
- Unispeech-sat: Universal Speech Representation Learning With Speaker Aware Pre-training (2021)0.00