Server-side Rescoring Of Spoken Entity-centric Knowledge Queries For Virtual Assistants
2023 Β· Youyuan Zhang, Sashank Gondala, Thiago Fraga-Silva, et al.
Abstract
On-device Virtual Assistants (VAs) powered by Automatic Speech Recognition (ASR) require effective knowledge integration for the challenging entity-rich query recognition. In this paper, we conduct an empirical study of modeling strategies for server-side rescoring of spoken information domain queries using various categories of Language Models (LMs) (N-gram word LMs, sub-word neural LMs). We investigate the combination of on-device and server-side signals, and demonstrate significant WER improvements of 23%-35% on various entity-centric query subpopulations by integrating various server-side LMs compared to performing ASR on-device only. We also perform a comparison between LMs trained on domain data and a GPT-3 variant offered by OpenAI as a baseline. Furthermore, we also show that model fusion of multiple server-side LMs trained from scratch most effectively combines complementary strengths of each model and integrates knowledge learned from domain-specific data to a VA ASR system.
Authors
(none)
Tags
Stats
Related papers
- Predicting Entity Popularity To Improve Spoken Entity Recognition By Virtual Assistants (2020)5.24
- A Multimodal Approach To Device-directed Speech Detection With Large Language Models (2024)7.16
- SELMA: A Speech-enabled Language Model For Virtual Assistant Interactions (2025)2.26
- VAIS ASR: Building A Conversational Speech Recognition System Using Language Model Combination (2019)0.00
- Prompting Large Language Models For Zero-shot Domain Adaptation In Speech Recognition (2023)0.00
- Multi-task Language Modeling For Improving Speech Recognition Of Rare Words (2020)8.35
- Tiny-align: Bridging Automatic Speech Recognition And Large Language Model On The Edge (2024)0.00
- Exploring The Integration Of Large Language Models Into Automatic Speech Recognition Systems: An Empirical Study (2023)8.09