Selective Hubert: Self-supervised Pre-training For Target Speaker In Clean And Mixture Speech
2023 Β· Jingru Lin, Meng Ge, Wupeng Wang, et al.
Abstract
Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks. Since they are mainly pre-trained to map input speech to pseudo-labels, the resulting representations are only effective for the type of pre-train data used, either clean or mixture speech. With the idea of selective auditory attention, we propose a novel pre-training solution called Selective-HuBERT, or SHuBERT, which learns the selective extraction of target speech representations from either clean or mixture speech. Specifically, SHuBERT is trained to predict pseudo labels of a target speaker, conditioned on an enrolled speech from the target speaker. By doing so, SHuBERT is expected to selectively attend to the target speaker in a complex acoustic environment, thus benefiting various downstream tasks. We further introduce a dual-path training strategy and use the cross-correlation constraint between the two branches to encourage the model to generate noise-invariant repres
Authors
(none)
Tags
Stats
Related papers
- Cocktail Hubert: Generalized Self-supervised Pre-training For Mixture And Single-source Speech (2023)6.77
- Hubert: Self-supervised Speech Representation Learning By Masked Prediction Of Hidden Units (2021)25.30
- Ms-hubert: Mitigating Pre-training And Inference Mismatch In Masked Language Modelling Methods For Learning Speech Representations (2024)4.52
- Spatial Hubert: Self-supervised Spatial Speech Representation Learning For A Single Talker From Multi-channel Audio (2023)0.00
- Text-guided Hubert: Self-supervised Speech Pre-training Via Generative Adversarial Networks (2024)4.52
- An Adapter Based Multi-label Pre-training For Speech Separation And Enhancement (2022)7.50
- Unispeech-sat: Universal Speech Representation Learning With Speaker Aware Pre-training (2021)0.00
- U-hubert: Unified Mixed-modal Speech Pretraining And Zero-shot Transfer To Unlabeled Modality (2022)5.99