End-to-end Spoken Language Understanding Using Transformer Networks And Self-supervised Pre-trained Features
2020 Β· Edmilson Morais, Hong-Kwang J. Kuo, Samuel Thomas, et al.
Abstract
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised pre-trained acoustic features, pre-trained model initialization and multi-task training. Several SLU experiments for predicting intent and entity labels/values using the ATIS dataset are performed. These experiments investigate the interaction of pre-trained model initialization and multi-task training with either traditional filterbank or self-supervised pre-trained acoustic features. Results show not only that self-supervised pre-trained acoustic features outperform filterbank features in almost all the experiments, but also that when these features are used in combination with multi-task tra
Authors
(none)
Tags
Stats
Related papers
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)9.41
- Improving Transducer-based Spoken Language Understanding With Self-conditioned CTC And Knowledge Transfer (2025)0.00
- End-to-end Spoken Language Understanding For Generalized Voice Assistants (2021)6.34
- Recent Advances In End-to-end Spoken Language Understanding (2019)8.09
- Leveraging Multilingual Self-supervised Pretrained Models For Sequence-to-sequence End-to-end Spoken Language Understanding (2023)0.00
- End-to-end Spoken Language Understanding: Performance Analyses Of A Voice Command Task In A Low Resource Setting (2022)8.35
- End-to-end Architectures For Asr-free Spoken Language Understanding (2019)8.60
- Improving End-to-end Models For Set Prediction In Spoken Language Understanding (2022)0.00