Augmenting Text For Spoken Language Understanding With Large Language Models
2023 Β· Roshan Sharma, Suyoun Kim, Daniel Lazar, et al.
Abstract
Spoken semantic parsing (SSP) involves generating machine-comprehensible parses from input speech. Training robust models for existing application domains represented in training data or extending to new domains requires corresponding triplets of speech-transcript-semantic parse data, which is expensive to obtain. In this paper, we address this challenge by examining methods that can use transcript-semantic parse data (unpaired text) without corresponding speech. First, when unpaired text is drawn from existing textual corpora, Joint Audio Text (JAT) and Text-to-Speech (TTS) are compared as ways to generate speech representations for unpaired text. Experiments on the STOP dataset show that unpaired text from existing and new domains improves performance by 2% and 30% in absolute Exact Match (EM) respectively. Second, we consider the setting when unpaired text is not available in existing textual corpora. We propose to prompt Large Language Models (LLMs) to generate unpaired text for ex
Authors
(none)
Tags
Stats
Related papers
- Paralinguistics-aware Speech-empowered Large Language Models For Natural Conversation (2024)3.96
- Speechlm: Enhanced Speech Pre-training With Unpaired Textual Data (2022)0.00
- Paralinguistics-enhanced Large Language Modeling Of Spoken Dialogue (2023)0.00
- Data Augmentation For Spoken Language Understanding Via Pretrained Language Models (2020)0.00
- Data Augmentation For Spoken Language Understanding Via Joint Variational Generation (2018)10.61
- Closing The Gap Between Text And Speech Understanding In Llms (2025)0.00
- Towards Reducing The Need For Speech Training Data To Build Spoken Language Understanding Systems (2022)8.35
- Instruction Data Generation And Unsupervised Adaptation For Speech Language Models (2024)3.58