Learning Domain Specific Language Models For Automatic Speech Recognition Through Machine Translation
2021 Β· Saurav Jha
Abstract
Automatic Speech Recognition (ASR) systems have been gaining popularity in the recent years for their widespread usage in smart phones and speakers. Building ASR systems for task-specific scenarios is subject to the availability of utterances that adhere to the style of the task as well as the language in question. In our work, we target such a scenario wherein task-specific text data is available in a language that is different from the target language in which an ASR Language Model (LM) is expected. We use Neural Machine Translation (NMT) as an intermediate step to first obtain translations of the task-specific text data. We then train LMs on the 1-best and N-best translations and study ways to improve on such a baseline LM. We develop a procedure to derive word confusion networks from NMT beam search graphs and evaluate LMs trained on these confusion networks. With experiments on the WMT20 chat translation task dataset, we demonstrate that NMT confusion networks can help to reduce t
Authors
(none)
Tags
Stats
Related papers
- Language Model Bootstrapping Using Neural Machine Translation For Conversational Speech Recognition (2019)5.24
- Low-latency Neural Speech Translation (2018)9.03
- Corpus Synthesis For Zero-shot ASR Domain Adaptation Using Large Language Models (2023)5.84
- Zero-resource Speech Translation And Recognition With Llms (2024)3.58
- Prompting Large Language Models For Zero-shot Domain Adaptation In Speech Recognition (2023)0.00
- Assessing The Tolerance Of Neural Machine Translation Systems Against Speech Recognition Errors (2019)2.26
- Transfer Learning Of Language-independent End-to-end ASR With Language Model Fusion (2018)0.00
- Memory Augmented Lookup Dictionary Based Language Modeling For Automatic Speech Recognition (2022)0.00