Speech To Semantics: Improve ASR And NLU Jointly Via All-neural Interfaces
2020 Β· Milind Rao, Anirudh Raju, Pranav Dheram, et al.
Abstract
We consider the problem of spoken language understanding (SLU) of extracting natural language intents and associated slot arguments or named entities from speech that is primarily directed at voice assistants. Such a system subsumes both automatic speech recognition (ASR) as well as natural language understanding (NLU). An end-to-end joint SLU model can be built to a required specification opening up the opportunity to deploy on hardware constrained scenarios like devices enabling voice assistants to work offline, in a privacy preserving manner, whilst also reducing server costs. We first present models that extract utterance intent directly from speech without intermediate text output. We then present a compositional model, which generates the transcript using the Listen Attend Spell ASR system and then extracts interpretation using a neural NLU model. Finally, we contrast these methods to a jointly trained end-to-end joint SLU model, consisting of ASR and NLU subsystems which are c
Authors
(none)
Tags
Stats
Related papers
- End-to-end Architectures For Asr-free Spoken Language Understanding (2019)8.60
- On Joint Training With Interfaces For Spoken Language Understanding (2021)7.16
- Recent Advances In End-to-end Spoken Language Understanding (2019)8.09
- Joint Online Spoken Language Understanding And Language Modeling With Recurrent Neural Networks (2016)13.28
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)9.41
- Multimodal Audio-textual Architecture For Robust Spoken Language Understanding (2023)0.00
- From Audio To Semantics: Approaches To End-to-end Spoken Language Understanding (2018)13.23
- Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding (2025)0.00