Contextual Biasing Of Language Models For Speech Recognition In Goal-oriented Conversational Agents
2021 Β· Ashish Shenoy, Sravan Bodapati, Katrin Kirchhoff
Abstract
Goal-oriented conversational interfaces are designed to accomplish specific tasks and typically have interactions that tend to span multiple turns adhering to a pre-defined structure and a goal. However, conventional neural language models (NLM) in Automatic Speech Recognition (ASR) systems are mostly trained sentence-wise with limited context. In this paper, we explore different ways to incorporate context into a LSTM based NLM in order to model long range dependencies and improve speech recognition. Specifically, we use context carry over across multiple turns and use lexical contextual cues such as system dialog act from Natural Language Understanding (NLU) models and the user provided structure of the chatbot. We also propose a new architecture that utilizes context embeddings derived from BERT on sample utterances provided during inference time. Our experiments show a word error rate (WER) relative reduction of 7% over non-contextual utterance-level NLM rescorers on goal-oriented
Authors
(none)
Tags
Stats
Related papers
- Attention-based Contextual Language Model Adaptation For Speech Recognition (2021)0.00
- Attentive Contextual Carryover For Multi-turn End-to-end Spoken Language Understanding (2021)7.16
- End-to-end Speech Recognition Contextualization With Large Language Models (2023)0.00
- Effective Cross-utterance Language Modeling For Conversational Speech Recognition (2021)2.26
- Context-aware RNNLM Rescoring For Conversational Speech Recognition (2020)4.52
- Robust Acoustic And Semantic Contextual Biasing In Neural Transducers For Speech Recognition (2023)8.60
- Improving Neural Biasing For Contextual Speech Recognition By Early Context Injection And Text Perturbation (2024)8.09
- Contextualized End-to-end Automatic Speech Recognition With Intermediate Biasing Loss (2024)5.84