Hubertopic: Enhancing Semantic Representation Of Hubert Through Self-supervision Utilizing Topic Model
2023 Β· Takashi Maekaku, Jiatong Shi, Xuankai Chang, et al.
Abstract
Recently, the usefulness of self-supervised representation learning (SSRL) methods has been confirmed in various downstream tasks. Many of these models, as exemplified by HuBERT and WavLM, use pseudo-labels generated from spectral features or the model's own representation features. From previous studies, it is known that the pseudo-labels contain semantic information. However, the masked prediction task, the learning criterion of HuBERT, focuses on local contextual information and may not make effective use of global semantic information such as speaker, theme of speech, and so on. In this paper, we propose a new approach to enrich the semantic representation of HuBERT. We apply topic model to pseudo-labels to generate a topic label for each utterance. An auxiliary topic classification task is added to HuBERT by using topic labels as teachers. This allows additional global semantic information to be incorporated in an unsupervised manner. Experimental results demonstrate that our meth
Authors
(none)
Tags
Stats
Related papers
- Hubert: Self-supervised Speech Representation Learning By Masked Prediction Of Hidden Units (2021)25.30
- Spatial Hubert: Self-supervised Spatial Speech Representation Learning For A Single Talker From Multi-channel Audio (2023)0.00
- Integrating Self-supervised Speech Model With Pseudo Word-level Targets From Visually-grounded Speech Model (2024)3.58
- Multi-resolution Hubert: Multi-resolution Speech Self-supervised Learning With Masked Unit Prediction (2023)0.00
- Selective Hubert: Self-supervised Pre-training For Target Speaker In Clean And Mixture Speech (2023)7.81
- Unispeech-sat: Universal Speech Representation Learning With Speaker Aware Pre-training (2021)0.00
- Ms-hubert: Mitigating Pre-training And Inference Mismatch In Masked Language Modelling Methods For Learning Speech Representations (2024)4.52
- Cocktail Hubert: Generalized Self-supervised Pre-training For Mixture And Single-source Speech (2023)6.77