Iterative Autoregression: A Novel Trick To Improve Your Low-latency Speech Enhancement Model
2022 Β· Pavel Andreev, Nicholas Babaev, Azat Saginbaev, et al.
Abstract
Streaming models are an essential component of real-time speech enhancement tools. The streaming regime constrains speech enhancement models to use only a tiny context of future information. As a result, the low-latency streaming setup is generally considered a challenging task and has a significant negative impact on the model's quality. However, the sequential nature of streaming generation offers a natural possibility for autoregression, that is, utilizing previous predictions while making current ones. The conventional method for training autoregressive models is teacher forcing, but its primary drawback lies in the training-inference mismatch that can lead to a substantial degradation in quality. In this study, we propose a straightforward yet effective alternative technique for training autoregressive low-latency speech enhancement models. We demonstrate that the proposed approach leads to stable improvement across diverse architectures and training scenarios.
Authors
(none)
Tags
Stats
Related papers
- Livespeech: Low-latency Zero-shot Text-to-speech Via Autoregressive Modeling Of Audio Discrete Codes (2024)5.84
- Dynamic Latency For Ctc-based Streaming Automatic Speech Recognition With Emformer (2022)0.00
- Modeling Strategies For Speech Enhancement In The Latent Space Of A Neural Audio Codec (2025)0.00
- Dynamic Latency Speech Recognition With Asynchronous Revision (2020)0.00
- Improving Streaming Automatic Speech Recognition With Non-streaming Model Distillation On Unsupervised Data (2020)0.00
- High Performance Sequence-to-sequence Model For Streaming Speech Recognition (2020)3.58
- Lookahead When It Matters: Adaptive Non-causal Transformers For Streaming Neural Transducers (2023)0.00
- Bridging The Gap Between Streaming And Non-streaming ASR Systems Bydistilling Ensembles Of CTC And RNN-T Models (2021)3.58