Data Augmentation For Spoken Language Understanding Via Pretrained Language Models
2020 Β· Baolin Peng, Chenguang Zhu, Michael Zeng, et al.
Abstract
The training of spoken language understanding (SLU) models often faces the problem of data scarcity. In this paper, we put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances. Furthermore, we investigate and propose solutions to two previously overlooked semi-supervised learning scenarios of data scarcity in SLU: i) Rich-in-Ontology: ontology information with numerous valid dialogue acts is given; ii) Rich-in-Utterance: a large number of unlabelled utterances are available. Empirical results show that our method can produce synthetic training data that boosts the performance of language understanding models in various scenarios.
Authors
(none)
Tags
Stats
Related papers
- Data Augmentation For Spoken Language Understanding Via Joint Variational Generation (2018)10.61
- Learning From Multiple Noisy Augmented Data Sets For Better Cross-lingual Spoken Language Understanding (2021)3.58
- Data Augmentation With Atomic Templates For Spoken Language Understanding (2019)5.24
- Using Speech Synthesis To Train End-to-end Spoken Language Understanding Models (2019)9.23
- A Study On The Integration Of Pre-trained SSL, ASR, LM And SLU Models For Spoken Language Understanding (2022)8.09
- Large-scale Transfer Learning For Low-resource Spoken Language Understanding (2020)2.26
- Towards Reducing The Need For Speech Training Data To Build Spoken Language Understanding Systems (2022)8.35
- Bridging The Gap Between Clean Data Training And Real-world Inference For Spoken Language Understanding (2021)0.00