Sequential End-to-end Intent And Slot Label Classification And Localization
2021 Β· Yiran Cao, Nihal Potdar, Anderson R. Avila
Abstract
Human-computer interaction (HCI) is significantly impacted by delayed responses from a spoken dialogue system. Hence, end-to-end (e2e) spoken language understanding (SLU) solutions have recently been proposed to decrease latency. Such approaches allow for the extraction of semantic information directly from the speech signal, thus bypassing the need for a transcript from an automatic speech recognition (ASR) system. In this paper, we propose a compact e2e SLU architecture for streaming scenarios, where chunks of the speech signal are processed continuously to predict intent and slot values. Our model is based on a 3D convolutional neural network (3D-CNN) and a unidirectional long short-term memory (LSTM). We compare the performance of two alignment-free losses: the connectionist temporal classification (CTC) method and its adapted version, namely connectionist temporal localization (CTL). The latter performs not only the classification but also localization of sequential audio events.
Authors
(none)
Tags
Stats
Related papers
- Recent Advances In End-to-end Spoken Language Understanding (2019)8.09
- End-to-end Architectures For Asr-free Spoken Language Understanding (2019)8.60
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)9.41
- Token-level Sequence Labeling For Spoken Language Understanding Using Compositional End-to-end Models (2022)0.00
- Attentive Contextual Carryover For Multi-turn End-to-end Spoken Language Understanding (2021)7.16
- Tie Your Embeddings Down: Cross-modal Latent Spaces For End-to-end Spoken Language Understanding (2020)9.03
- Joint Learning Of Word And Label Embeddings For Sequence Labelling In Spoken Language Understanding (2019)3.58
- Intent Recognition And Unsupervised Slot Identification For Low Resourced Spoken Dialog Systems (2021)2.26