CAT: Crf-based ASR Toolkit
2019 Β· Keyu An, Hongyu Xiang, Zhijian Ou
Abstract
In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CAT (CRF-based ASR Toolkit). A key feature of CAT is discriminative training in the framework of conditional random field (CRF), particularly with connectionist temporal classification (CTC) inspired state topology. CAT contains a full-fledged implementation of CTC-CRF and provides a complete workflow for CRF-based end-to-end speech recognition. Evaluation results on Chinese and English benchmarks such as Switchboard and Aishell show that CAT obtains the state-of-the-art results among existing end-to-end models with less parameters, and is competitive compared with the hybrid DNN-HMM models. Towards flexibility, we show that i-vector based speaker-adapted recognition and latency control mechanism can be explored easily and effectively in CAT. We hope CAT, especially the CRF-based framework and software, will be of broad interest to the community, and can be further explored and improved.
Authors
(none)
Tags
Stats
Related papers
- CAT: A CTC-CRF Based ASR Toolkit Bridging The Hybrid And The End-to-end Approaches Towards Data Efficiency And Low Latency (2020)9.03
- Advancing CTC-CRF Based End-to-end Speech Recognition With Wordpieces And Conformers (2021)0.00
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- A CTC Alignment-based Non-autoregressive Transformer For End-to-end Automatic Speech Recognition (2023)10.97
- CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition (2024)6.30
- BERT Meets CTC: New Formulation Of End-to-end Speech Recognition With Pre-trained Masked Language Model (2022)0.00
- Knn-ctc: Enhancing ASR Via Retrieval Of CTC Pseudo Labels (2023)11.36
- META-CAT: Speaker-informed Speech Embeddings Via Meta Information Concatenation For Multi-talker ASR (2024)3.58