C2c-genda: Cluster-to-cluster Generation For Data Augmentation Of Slot Filling
2020 Β· Yutai Hou, Sanyuan Chen, Wanxiang Che, et al.
Abstract
Slot filling, a fundamental module of spoken language understanding, often suffers from insufficient quantity and diversity of training data. To remedy this, we propose a novel Cluster-to-Cluster generation framework for Data Augmentation (DA), named C2C-GenDA. It enlarges the training set by reconstructing existing utterances into alternative expressions while keeping semantic. Different from previous DA works that reconstruct utterances one by one independently, C2C-GenDA jointly encodes multiple existing utterances of the same semantics and simultaneously decodes multiple unseen expressions. Jointly generating multiple new utterances allows to consider the relations between generated instances and encourages diversity. Besides, encoding multiple existing utterances endows C2C with a wider view of existing expressions, helping to reduce generation that duplicates existing data. Experiments on ATIS and Snips datasets show that instances augmented by C2C-GenDA improve slot filling by 7
Authors
(none)
Tags
Stats
Related papers
- Knowledge-aware Audio-grounded Generative Slot Filling For Limited Annotated Data (2023)0.00
- Data Augmentation For Spoken Language Understanding Via Joint Variational Generation (2018)10.61
- Multi-domain Adversarial Learning For Slot Filling In Spoken Language Understanding (2017)0.00
- Significance Of Data Augmentation For Improving Cleft Lip And Palate Speech Recognition (2021)0.00
- Data Augmentation With Atomic Templates For Spoken Language Understanding (2019)5.24
- Improving Slot Filling By Utilizing Contextual Information (2019)2.26
- Code-switching Sentence Generation By Generative Adversarial Networks And Its Application To Data Augmentation (2018)0.00
- Data Augmentation Methods For End-to-end Speech Recognition On Distant-talk Scenarios (2021)6.34