Exploring Transfer Learning For End-to-end Spoken Language Understanding
2020 Β· Subendhu Rongali, Beiye Liu, Liwei Cai, et al.
Abstract
Voice Assistants such as Alexa, Siri, and Google Assistant typically use a two-stage Spoken Language Understanding pipeline; first, an Automatic Speech Recognition (ASR) component to process customer speech and generate text transcriptions, followed by a Natural Language Understanding (NLU) component to map transcriptions to an actionable hypothesis. An end-to-end (E2E) system that goes directly from speech to a hypothesis is a more attractive option. These systems were shown to be smaller, faster, and better optimized. However, they require massive amounts of end-to-end training data and in addition, don't take advantage of the already available ASR and NLU training data. In this work, we propose an E2E system that is designed to jointly train on multiple speech-to-text tasks, such as ASR (speech-transcription) and SLU (speech-hypothesis), and text-to-text tasks, such as NLU (text-hypothesis). We call this the Audio-Text All-Task (AT-AT) Model and we show that it beats the performan
Authors
(none)
Tags
Stats
Related papers
- End-to-end Spoken Language Understanding For Generalized Voice Assistants (2021)6.34
- Towards End-to-end Spoken Language Understanding (2018)14.73
- End-to-end Spoken Language Understanding: Performance Analyses Of A Voice Command Task In A Low Resource Setting (2022)8.35
- Large-scale Transfer Learning For Low-resource Spoken Language Understanding (2020)2.26
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)9.41
- Attentive Contextual Carryover For Multi-turn End-to-end Spoken Language Understanding (2021)7.16
- Using Speech Synthesis To Train End-to-end Spoken Language Understanding Models (2019)9.23
- End-to-end Spoken Language Understanding Using Transformer Networks And Self-supervised Pre-trained Features (2020)5.24