Context-aware RNNLM Rescoring For Conversational Speech Recognition
2020 Β· Kun Wei, Pengcheng Guo, Hang Lv, et al.
Abstract
Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies. Prior work has explored the modeling of long-range context through RNNLM rescoring with improved performance. To further take advantage of the persisted nature during a conversation, such as topics or speaker turn, we extend the rescoring procedure to a new context-aware manner. For RNNLM training, we capture the contextual dependencies by concatenating adjacent sentences with various tag words, such as speaker or intention information. For lattice rescoring, the lattice of adjacent sentences are also connected with the first-pass decoded result by tag words. Besides, we also adopt a selective concatenation strategy based on tf-idf, making the best use of contextual similarity to improve transcription performance. Results on four different conversation test sets show that our approach yields up to 13.1% and 6% relative char-error-rate (CER) reduction
Authors
(none)
Tags
Stats
Related papers
- Lattice Rescoring Strategies For Long Short Term Memory Language Models In Speech Recognition (2017)9.76
- Contextualizing ASR Lattice Rescoring With Hybrid Pointer Network Language Model (2020)8.09
- Future Word Contexts In Neural Network Language Models (2017)8.35
- Towards ASR Robust Spoken Language Understanding Through In-context Learning With Word Confusion Networks (2024)0.00
- Improving RNN-T ASR Accuracy Using Context Audio (2020)5.84
- Contextual Biasing Of Language Models For Speech Recognition In Goal-oriented Conversational Agents (2021)0.00
- Discriminative Speech Recognition Rescoring With Pre-trained Language Models (2023)2.26
- Effective Cross-utterance Language Modeling For Conversational Speech Recognition (2021)2.26