Full-text Error Correction For Chinese Speech Recognition With Large Language Model
2024 Β· Zhiyuan Tang, Dong Wang, Shen Huang, et al.
Abstract
Large Language Models (LLMs) have demonstrated substantial potential for error correction in Automatic Speech Recognition (ASR). However, most research focuses on utterances from short-duration speech recordings, which are the predominant form of speech data for supervised ASR training. This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings, such as transcripts from podcasts, news broadcasts, and meetings. First, we develop a Chinese dataset for full-text error correction, named ChFT, utilizing a pipeline that involves text-to-speech synthesis, ASR, and error-correction pair extractor. This dataset enables us to correct errors across contexts, including both full-text and segment, and to address a broader range of error types, such as punctuation restoration and inverse text normalization, thus making the correction process comprehensive. Second, we fine-tune a pre-trained LLM on the constructed dataset
Authors
(none)
Tags
Stats
Related papers
- Chain Of Correction For Full-text Speech Recognition With Large Language Models (2025)0.00
- Multi-stage Large Language Model Correction For Speech Recognition (2023)0.00
- ASR Error Correction Using Large Language Models (2024)9.41
- Exploring The Integration Of Large Language Models Into Automatic Speech Recognition Systems: An Empirical Study (2023)8.09
- Large Language Model Based Generative Error Correction: A Challenge And Baselines For Speech Recognition, Speaker Tagging, And Emotion Recognition (2024)7.81
- Harnessing The Zero-shot Power Of Instruction-tuned Large Language Model In End-to-end Speech Recognition (2023)0.00
- Towards Interfacing Large Language Models With ASR Systems Using Confidence Measures And Prompting (2024)7.16
- Llm-based Speaker Diarization Correction: A Generalizable Approach (2024)7.16