Improving Streaming Transformer Based ASR Under A Framework Of Self-supervised Learning
2021 Β· Songjun Cao, Yueteng Kang, Yanzhe Fu, et al.
Abstract
Recently self-supervised learning has emerged as an effective approach to improve the performance of automatic speech recognition (ASR). Under such a framework, the neural network is usually pre-trained with massive unlabeled data and then fine-tuned with limited labeled data. However, the non-streaming architecture like bidirectional transformer is usually adopted by the neural network to achieve competitive results, which can not be used in streaming scenarios. In this paper, we mainly focus on improving the performance of streaming transformer under the self-supervised learning framework. Specifically, we propose a novel two-stage training method during fine-tuning, which combines knowledge distilling and self-training. The proposed training method achieves 16.3% relative word error rate (WER) reduction on Librispeech noisy test set. Finally, by only using the 100h clean subset of Librispeech as the labeled data and the rest (860h) as the unlabeled data, our streaming transformer ba
Authors
(none)
Tags
Stats
Related papers
- Improving Streaming Automatic Speech Recognition With Non-streaming Model Distillation On Unsupervised Data (2020)0.00
- A Comparison Of Semi-supervised Learning Techniques For Streaming ASR At Scale (2023)2.26
- Fast Streaming Transducer ASR Prototyping Via Knowledge Distillation With Whisper (2024)2.26
- A Low Latency Attention Module For Streaming Self-supervised Speech Representation Learning (2023)0.00
- Streaming Transformer-based Acoustic Models Using Self-attention With Augmented Memory (2020)0.00
- Multiple-hypothesis RNN-T Loss For Unsupervised Fine-tuning And Self-training Of Neural Transducer (2022)0.00
- Self-supervised Rewiring Of Pre-trained Speech Encoders: Towards Faster Fine-tuning With Less Labels In Speech Processing (2022)3.58
- Reducing The Gap Between Streaming And Non-streaming Transducer-based ASR By Adaptive Two-stage Knowledge Distillation (2023)4.52