Knn-ctc: Enhancing ASR Via Retrieval Of CTC Pseudo Labels
2023 Β· Jiaming Zhou, Shiwan Zhao, Yaqi Liu, et al.
Abstract
The success of retrieval-augmented language models in various natural language processing (NLP) tasks has been constrained in automatic speech recognition (ASR) applications due to challenges in constructing fine-grained audio-text datastores. This paper presents kNN-CTC, a novel approach that overcomes these challenges by leveraging Connectionist Temporal Classification (CTC) pseudo labels to establish frame-level audio-text key-value pairs, circumventing the need for precise ground truth alignments. We further introduce a skip-blank strategy, which strategically ignores CTC blank frames, to reduce datastore size. kNN-CTC incorporates a k-nearest neighbors retrieval mechanism into pre-trained CTC ASR systems, achieving significant improvements in performance. By incorporating a k-nearest neighbors retrieval mechanism into pre-trained CTC ASR systems and leveraging a fine-grained, pruned datastore, kNN-CTC consistently achieves substantial improvements in performance under various expe
Authors
(none)
Tags
Stats
Related papers
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- Improving Zero-shot Chinese-english Code-switching ASR With Knn-ctc And Gated Monolingual Datastores (2024)0.00
- Residual Convolutional CTC Networks For Automatic Speech Recognition (2017)0.00
- CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition (2024)6.30
- Improving LSTM-CTC Based ASR Performance In Domains With Limited Training Data (2017)0.00
- BERT Meets CTC: New Formulation Of End-to-end Speech Recognition With Pre-trained Masked Language Model (2022)0.00
- CAT: A CTC-CRF Based ASR Toolkit Bridging The Hybrid And The End-to-end Approaches Towards Data Efficiency And Low Latency (2020)9.03
- Joint Masked CPC And CTC Training For ASR (2020)8.60