CAT: A CTC-CRF Based ASR Toolkit Bridging The Hybrid And The End-to-end Approaches Towards Data Efficiency And Low Latency
2020 Β· Keyu An, Hongyu Xiang, Zhijian Ou
Abstract
In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art results, which are comparable to the fine-tuned hybrid models in Kaldi but with a much simpler training pipeline. Compared to existing non-modularized E2E models, CAT performs better on limited-scale datasets, demonstrating its data efficiency. Furthermore, we propose a new method called contextualized soft forgetting, which enables CAT to do streaming ASR without accuracy degradation. We hope CAT, especially the CTC-CRF based framework and software, will be of broad interest to the community, and can be further explored and improved.
Authors
(none)
Tags
Stats
Related papers
- CAT: Crf-based ASR Toolkit (2019)0.00
- Advancing CTC-CRF Based End-to-end Speech Recognition With Wordpieces And Conformers (2021)0.00
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- META-CAT: Speaker-informed Speech Embeddings Via Meta Information Concatenation For Multi-talker ASR (2024)3.58
- Online Hybrid Ctc/attention End-to-end Automatic Speech Recognition Architecture (2023)12.99
- Knn-ctc: Enhancing ASR Via Retrieval Of CTC Pseudo Labels (2023)11.36
- Continual Learning For Monolingual End-to-end Automatic Speech Recognition (2021)7.16
- BERT Meets CTC: New Formulation Of End-to-end Speech Recognition With Pre-trained Masked Language Model (2022)0.00