End-to-end Spoken Language Understanding For Generalized Voice Assistants
2021 Β· Michael Saxon, Samridhi Choudhary, Joseph P. McKenna, et al.
Abstract
End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model. Previous work in this area has focused on targeted tasks in fixed domains, where the output semantic structure is assumed a priori and the input speech is of limited complexity. In this work we present our approach to developing an E2E model for generalized SLU in commercial voice assistants (VAs). We propose a fully differentiable, transformer-based, hierarchical system that can be pretrained at both the ASR and NLU levels. This is then fine-tuned on both transcription and semantic classification losses to handle a diverse set of intent and argument combinations. This leads to an SLU system that achieves significant improvements over baselines on a complex internal generalized VA dataset with a 43% improvement in accuracy, while still meeting the 99% accuracy benchmark on the popular Fluent Speech Commands dataset. We further evaluate our model on a hard
Authors
(none)
Tags
Stats
Related papers
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)9.41
- End-to-end Spoken Language Understanding: Performance Analyses Of A Voice Command Task In A Low Resource Setting (2022)8.35
- Exploring Transfer Learning For End-to-end Spoken Language Understanding (2020)5.24
- End-to-end Architectures For Asr-free Spoken Language Understanding (2019)8.60
- Modality Confidence Aware Training For Robust End-to-end Spoken Language Understanding (2023)2.26
- Recent Advances In End-to-end Spoken Language Understanding (2019)8.09
- Improving End-to-end Models For Set Prediction In Spoken Language Understanding (2022)0.00
- Attentive Contextual Carryover For Multi-turn End-to-end Spoken Language Understanding (2021)7.16