Wav2code: Restore Clean Speech Representations Via Codebook Lookup For Noise-robust ASR
2023 Β· Yuchen Hu, Chen Chen, Qiushi Zhu, et al.
Abstract
Automatic speech recognition (ASR) has gained remarkable successes thanks to recent advances of deep learning, but it usually degrades significantly under real-world noisy conditions. Recent works introduce speech enhancement (SE) as front-end to improve speech quality, which is proved effective but may not be optimal for downstream ASR due to speech distortion problem. Based on that, latest works combine SE and currently popular self-supervised learning (SSL) to alleviate distortion and improve noise robustness. Despite the effectiveness, the speech distortion caused by conventional SE still cannot be cleared out. In this paper, we propose a self-supervised framework named Wav2code to implement a feature-level SE with reduced distortions for noise-robust ASR. First, in pre-training stage the clean speech representations from SSL model are sent to lookup a discrete codebook via nearest-neighbor feature matching, the resulted code sequence are then exploited to reconstruct the original
Authors
(none)
Tags
Stats
Related papers
- Wav2vec-switch: Contrastive Learning From Original-noisy Speech Pairs For Robust Speech Recognition (2021)12.93
- A Noise-robust Self-supervised Pre-training Model Based Speech Representation Learning For Automatic Speech Recognition (2022)11.19
- Supervision-guided Codebooks For Masked Prediction In Speech Pre-training (2022)7.81
- Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR (2022)0.00
- Av2wav: Diffusion-based Re-synthesis From Continuous Self-supervised Features For Audio-visual Speech Enhancement (2023)0.00
- Restorative Speech Enhancement: A Progressive Approach Using SE And Codec Modules (2024)0.00
- Incorporating Symbolic Sequential Modeling For Speech Enhancement (2019)0.00
- Downstream Task Agnostic Speech Enhancement With Self-supervised Representation Loss (2023)6.77