Efficient Extraction Of Noise-robust Discrete Units From Self-supervised Speech Models
2024 Β· Jakob Poncelet, Yujun Wang, Hugo van Hamme
Abstract
Continuous speech can be converted into a discrete sequence by deriving discrete units from the hidden features of self-supervised learned (SSL) speech models. Although SSL models are becoming larger and trained on more data, they are often sensitive to real-life distortions like additive noise or reverberation, which translates to a shift in discrete units. We propose a parameter-efficient approach to generate noise-robust discrete units from pre-trained SSL models by training a small encoder-decoder model, with or without adapters, to simultaneously denoise and discretise the hidden features of the SSL model. The model learns to generate a clean discrete sequence for a noisy utterance, conditioned on the SSL features. The proposed denoiser outperforms several pre-training methods on the tasks of noisy discretisation and noisy speech recognition, and can be finetuned to the target environment with a few recordings of unlabeled target data.
Authors
(none)
Tags
Stats
Related papers
- Exploration Of Efficient End-to-end ASR Using Discretized Input From Self-supervised Learning (2023)12.02
- A Pre-training Framework That Encodes Noise Information For Speech Quality Assessment (2024)3.58
- Feature Learning And Ensemble Pre-tasks Based Self-supervised Speech Denoising And Dereverberation (2022)0.00
- MMM: Multi-layer Multi-residual Multi-stream Discrete Speech Representation From Self-supervised Learning Model (2024)6.77
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00
- Downstream Task Agnostic Speech Enhancement With Self-supervised Representation Loss (2023)6.77
- Recycle-and-distill: Universal Compression Strategy For Transformer-based Speech SSL Models With Attention Map Reusing And Masking Distillation (2023)5.84
- Fusion Of Discrete Representations And Self-augmented Representations For Multilingual Automatic Speech Recognition (2024)2.26