A Study On The Integration Of Pre-trained SSL, ASR, LM And SLU Models For Spoken Language Understanding
2022 Β· Yifan Peng, Siddhant Arora, Yosuke Higuchi, et al.
Abstract
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired by this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, we employ four types of pre-trained models and their combinations for SLU. We leverage self-supervised speech and language models (LM) pre-trained on large quantities of unpaired data to extract strong speech and text representations. We also explore using supervised models pre-trained on larger external automatic speech recognition (ASR) or SLU corpora. We conduct extensive experiments on the SLU Evaluation (SLUE) benchmark and observe self-supervised pre-trained models to be more powerful, with pre-trained LM and speech models being most beneficial for the Sentiment Analysis and Named Entity Recognition task, respectively.
Authors
(none)
Tags
Stats
Related papers
- Integrating Pretrained ASR And LM To Perform Sequence Generation For Spoken Language Understanding (2023)5.24
- Data Augmentation For Spoken Language Understanding Via Pretrained Language Models (2020)0.00
- Style Attuned Pre-training And Parameter Efficient Fine-tuning For Spoken Language Understanding (2020)6.77
- SLUE Phase-2: A Benchmark Suite Of Diverse Spoken Language Understanding Tasks (2022)10.07
- Unislu: Unified Spoken Language Understanding From Heterogeneous Cross-task Datasets (2025)0.00
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)9.41
- Large-scale Transfer Learning For Low-resource Spoken Language Understanding (2020)2.26
- Understanding Semantics From Speech Through Pre-training (2019)0.00