Using Speech Synthesis To Train End-to-end Spoken Language Understanding Models
2019 Β· Loren Lugosch, Brett Meyer, Derek Nowrouzezahrai, et al.
Abstract
End-to-end models are an attractive new approach to spoken language understanding (SLU) in which the meaning of an utterance is inferred directly from the raw audio without employing the standard pipeline composed of a separately trained speech recognizer and natural language understanding module. The downside of end-to-end SLU is that in-domain speech data must be recorded to train the model. In this paper, we propose a strategy for overcoming this requirement in which speech synthesis is used to generate a large synthetic training dataset from several artificial speakers. Experiments on two open-source SLU datasets confirm the effectiveness of our approach, both as a sole source of training data and as a form of data augmentation.
Authors
(none)
Tags
Stats
Related papers
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)9.41
- Large-scale Transfer Learning For Low-resource Spoken Language Understanding (2020)2.26
- Data Augmentation For Spoken Language Understanding Via Pretrained Language Models (2020)0.00
- Towards Reducing The Need For Speech Training Data To Build Spoken Language Understanding Systems (2022)8.35
- Data Augmentation For Spoken Language Understanding Via Joint Variational Generation (2018)10.61
- End-to-end Model For Named Entity Recognition From Speech Without Paired Training Data (2022)6.77
- Improving End-to-end Speech Processing By Efficient Text Data Utilization With Latent Synthesis (2023)0.00
- Improving End-to-end Models For Set Prediction In Spoken Language Understanding (2022)0.00