An Adapter Based Multi-label Pre-training For Speech Separation And Enhancement
2022 Β· Tianrui Wang, Xie Chen, Zhuo Chen, et al.
Abstract
In recent years, self-supervised learning (SSL) has achieved tremendous success in various speech tasks due to its power to extract representations from massive unlabeled data. However, compared with tasks such as speech recognition (ASR), the improvements from SSL representation in speech separation (SS) and enhancement (SE) are considerably smaller. Based on HuBERT, this work investigates improving the SSL model for SS and SE. We first update HuBERT's masked speech prediction (MSP) objective by integrating the separation and denoising terms, resulting in a multiple pseudo label pre-training scheme, which significantly improves HuBERT's performance on SS and SE but degrades the performance on ASR. To maintain its performance gain on ASR, we further propose an adapter-based architecture for HuBERT's Transformer encoder, where only a few parameters of each layer are adjusted to the multiple pseudo label MSP while other parameters remain frozen as default HuBERT. Experimental results sho
Authors
(none)
Tags
Stats
Related papers
- Multi-resolution Hubert: Multi-resolution Speech Self-supervised Learning With Masked Unit Prediction (2023)0.00
- Unispeech-sat: Universal Speech Representation Learning With Speaker Aware Pre-training (2021)0.00
- Efficient Adapter Transfer Of Self-supervised Speech Models For Automatic Speech Recognition (2022)12.68
- Progressive Residual Extraction Based Pre-training For Speech Representation Learning (2024)0.00
- Ms-hubert: Mitigating Pre-training And Inference Mismatch In Masked Language Modelling Methods For Learning Speech Representations (2024)4.52
- Pushing The Limits Of Unsupervised Unit Discovery For SSL Speech Representation (2023)6.34
- Fast-hubert: An Efficient Training Framework For Self-supervised Speech Representation Learning (2023)0.00
- CHAPTER: Exploiting Convolutional Neural Network Adapters For Self-supervised Speech Models (2022)7.50