A Spelling Correction Model For End-to-end Speech Recognition
2019 Β· Jinxi Guo, Tara N. Sainath, Ron J. Weiss
Abstract
Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language model component of the end-to-end model is only trained on transcribed audio-text pairs, which leads to performance degradation especially on rare words. While there have been a variety of work that look at incorporating an external LM trained on text-only data into the end-to-end framework, none of them have taken into account the characteristic error distribution made by the model. In this paper, we propose a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct those errors. On the LibriSpeech dataset, we demonstrate that the proposed model results in an 18.6% relative improvement in WER over the baseline model when directly correcting top ASR hypothesis, and a 29.0% relative improvement when
Authors
(none)
Tags
Stats
Related papers
- Audio-attention Discriminative Language Model For ASR Rescoring (2019)9.23
- ASR Error Correction Using Large Language Models (2024)9.41
- Towards Contextual Spelling Correction For Customization Of End-to-end Speech Recognition Systems (2022)9.92
- Multi-stage Large Language Model Correction For Speech Recognition (2023)0.00
- Chain Of Correction For Full-text Speech Recognition With Large Language Models (2025)0.00
- Towards Better Decoding And Language Model Integration In Sequence To Sequence Models (2016)15.67
- State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017)21.01
- Diacorrect: End-to-end Error Correction For Speaker Diarization (2022)0.00