Towards End-to-end Spoken Language Understanding
2018 Β· Dmitriy Serdyuk, Yongqiang Wang, Christian Fuegen, et al.
Abstract
Spoken language understanding system is traditionally designed as a pipeline of a number of components. First, the audio signal is processed by an automatic speech recognizer for transcription or n-best hypotheses. With the recognition results, a natural language understanding system classifies the text to structured data as domain, intent and slots for down-streaming consumers, such as dialog system, hands-free applications. These components are usually developed and optimized independently. In this paper, we present our study on an end-to-end learning system for spoken language understanding. With this unified approach, we can infer the semantic meaning directly from audio features without the intermediate text representation. This study showed that the trained model can achieve reasonable good result and demonstrated that the model can capture the semantic attention directly from the audio features.
Authors
(none)
Tags
Stats
Related papers
- From Audio To Semantics: Approaches To End-to-end Spoken Language Understanding (2018)13.23
- Recent Advances In End-to-end Spoken Language Understanding (2019)8.09
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)9.41
- Pretrained Semantic Speech Embeddings For End-to-end Spoken Language Understanding Via Cross-modal Teacher-student Learning (2020)9.92
- Exploring Transfer Learning For End-to-end Spoken Language Understanding (2020)5.24
- End-to-end Spoken Language Understanding For Generalized Voice Assistants (2021)6.34
- Modality Confidence Aware Training For Robust End-to-end Spoken Language Understanding (2023)2.26
- End-to-end Architectures For Asr-free Spoken Language Understanding (2019)8.60