Speech Enhancement Using Self-supervised Pre-trained Model And Vector Quantization
2022 Β· Xiao-Ying Zhao, Qiu-Shi Zhu, Jie Zhang
Abstract
With the development of deep learning, neural network-based speech enhancement (SE) models have shown excellent performance. Meanwhile, it was shown that the development of self-supervised pre-trained models can be applied to various downstream tasks. In this paper, we will consider the application of the pre-trained model to the real-time SE problem. Specifically, the encoder and bottleneck layer of the DEMUCS model are initialized using the self-supervised pretrained WavLM model, the convolution in the encoder is replaced by causal convolution, and the transformer encoder in the bottleneck layer is based on causal attention mask. In addition, as discretizing the noisy speech representations is more beneficial for denoising, we utilize a quantization module to discretize the representation output from the bottleneck layer, which is then fed into the decoder to reconstruct the clean speech waveform. Experimental results on the Valentini dataset and an internal dataset show that the pre
Authors
(none)
Tags
Stats
Related papers
- Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR (2022)0.00
- Causal Speech Enhancement With Predicting Semantics Based On Quantized Self-supervised Learning Features (2024)3.58
- Self-supervised Learning With Random-projection Quantizer For Speech Recognition (2022)0.00
- On The Impact Of Quantization And Pruning Of Self-supervised Speech Models For Downstream Speech Recognition Tasks "in-the-wild'' (2023)0.00
- Enhancing Into The Codec: Noise Robust Speech Coding With Vector-quantized Autoencoders (2021)10.21
- Wavlm: Large-scale Self-supervised Pre-training For Full Stack Speech Processing (2021)24.00
- Towards Unsupervised Phone And Word Segmentation Using Self-supervised Vector-quantized Neural Networks (2020)0.00
- Audio-visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks (2017)17.39