Low-latency Speech Enhancement Via Speech Token Generation
2023 Β· Huaying Xue, Xiulian Peng, Yan Lu
Abstract
Existing deep learning based speech enhancement mainly employ a data-driven approach, which leverage large amounts of data with a variety of noise types to achieve noise removal from noisy signal. However, the high dependence on the data limits its generalization on the unseen complex noises in real-life environment. In this paper, we focus on the low-latency scenario and regard speech enhancement as a speech generation problem conditioned on the noisy signal, where we generate clean speech instead of identifying and removing noises. Specifically, we propose a conditional generative framework for speech enhancement, which models clean speech by acoustic codes of a neural speech codec and generates the speech codes conditioned on past noisy frames in an auto-regressive way. Moreover, we propose an explicit-alignment approach to align noisy frames with the generated speech tokens to improve the robustness and scalability to different input lengths. Different from other methods that lever
Authors
(none)
Tags
Stats
Related papers
- Livespeech: Low-latency Zero-shot Text-to-speech Via Autoregressive Modeling Of Audio Discrete Codes (2024)5.84
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Towards Generalized Speech Enhancement With Generative Adversarial Networks (2019)10.35
- Conditional Generative Adversarial Networks For Speech Enhancement And Noise-robust Speaker Verification (2017)16.03
- Noise Robust TTS For Low Resource Speakers Using Pre-trained Model And Speech Enhancement (2020)0.00
- Multi-metric Optimization Using Generative Adversarial Networks For Near-end Speech Intelligibility Enhancement (2021)8.60
- Modeling Strategies For Speech Enhancement In The Latent Space Of A Neural Audio Codec (2025)0.00
- Unetgan: A Robust Speech Enhancement Approach In Time Domain For Extremely Low Signal-to-noise Ratio Condition (2020)11.49