Discriminative Speech Recognition Rescoring With Pre-trained Language Models
2023 Β· Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, et al.
Abstract
Second pass rescoring is a critical component of competitive automatic speech recognition (ASR) systems. Large language models have demonstrated their ability in using pre-trained information for better rescoring of ASR hypothesis. Discriminative training, directly optimizing the minimum word-error-rate (MWER) criterion typically improves rescoring. In this study, we propose and explore several discriminative fine-tuning schemes for pre-trained LMs. We propose two architectures based on different pooling strategies of output embeddings and compare with probability based MWER. We conduct detailed comparisons between pre-trained causal and bidirectional LMs in discriminative settings. Experiments on LibriSpeech demonstrate that all MWER training schemes are beneficial, giving additional gains upto 8.5% WER. Proposed pooling variants achieve lower latency while retaining most improvements. Finally, our study concludes that bidirectionality is better utilized with discriminative training.
Authors
(none)
Tags
Stats
Related papers
- Audio-attention Discriminative Language Model For ASR Rescoring (2019)9.23
- Multi-task Language Modeling For Improving Speech Recognition Of Rare Words (2020)8.35
- Low-rank Adaptation Of Large Language Model Rescoring For Parameter-efficient Speech Recognition (2023)11.76
- Lattice Rescoring Strategies For Long Short Term Memory Language Models In Speech Recognition (2017)9.76
- Integrating Pre-trained Speech And Language Models For End-to-end Speech Recognition (2023)0.00
- A Language Score Based Output Selection Method For Multilingual Speech Recognition (2020)0.00
- Context-aware RNNLM Rescoring For Conversational Speech Recognition (2020)4.52
- Investigating Training Strategies And Model Robustness Of Low-rank Adaptation For Language Modeling In Speech Recognition (2024)0.00