Attention-based Contextual Language Model Adaptation For Speech Recognition
2021 Β· Richard Diehl Martinez, Scott Novotney, Ivan Bulyko, et al.
Abstract
Language modeling (LM) for automatic speech recognition (ASR) does not usually incorporate utterance level contextual information. For some domains like voice assistants, however, additional context, such as the time at which an utterance was spoken, provides a rich input signal. We introduce an attention mechanism for training neural speech recognition language models on both text and non-linguistic contextual data. When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7.0% relative over a standard LM that does not incorporate contextual information. When evaluated on utterances extracted from the long tail of the dataset, our method improves perplexity by 9.0% relative over a standard LM and by over 2.8% relative when compared to a state-of-the-art model for contextual LM.
Authors
(none)
Tags
Stats
Related papers
- End-to-end Speech Recognition Contextualization With Large Language Models (2023)0.00
- Enhancing Large Language Model-based Speech Recognition By Contextualization For Rare And Ambiguous Words (2024)0.00
- Contextual Biasing Of Language Models For Speech Recognition In Goal-oriented Conversational Agents (2021)0.00
- Fast Contextual Adaptation With Neural Associative Memory For On-device Personalized Speech Recognition (2021)9.76
- Internal Language Model Estimation Through Explicit Context Vector Learning For Attention-based Encoder-decoder ASR (2022)7.50
- Effective Text Adaptation For Llm-based ASR Through Soft Prompt Fine-tuning (2024)5.84
- Advancing Connectionist Temporal Classification With Attention Modeling (2018)11.49
- A Multimodal Approach To Device-directed Speech Detection With Large Language Models (2024)7.16