Unislu: Unified Spoken Language Understanding From Heterogeneous Cross-task Datasets
2025 Β· Zhichao Sheng, Shilin Zhou, Chen Gong, et al.
Abstract
Spoken Language Understanding (SLU) plays a crucial role in speech-centric multimedia applications, enabling machines to comprehend spoken language in scenarios such as meetings, interviews, and customer service interactions. SLU encompasses multiple tasks, including Automatic Speech Recognition (ASR), spoken Named Entity Recognition (NER), and spoken Sentiment Analysis (SA). However, existing methods often rely on separate model architectures for individual tasks such as spoken NER and SA, which increases system complexity, limits cross-task interaction, and fails to fully exploit heterogeneous datasets available across tasks. To address these limitations, we propose UniSLU, a unified framework that jointly models multiple SLU tasks within a single architecture. Specifically, we propose a unified representation for diverse SLU tasks, enabling full utilization of heterogeneous datasets across multiple tasks. Built upon this representation, we propose a unified generative method that jo
Authors
(none)
Tags
Stats
Related papers
- SLUE Phase-2: A Benchmark Suite Of Diverse Spoken Language Understanding Tasks (2022)10.07
- Integrating Pretrained ASR And LM To Perform Sequence Generation For Spoken Language Understanding (2023)5.24
- A Study On The Integration Of Pre-trained SSL, ASR, LM And SLU Models For Spoken Language Understanding (2022)8.09
- Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding (2025)0.00
- Building Robust Spoken Language Understanding By Cross Attention Between Phoneme Sequence And ASR Hypothesis (2022)2.26
- Recent Advances In End-to-end Spoken Language Understanding (2019)8.09
- On Joint Training With Interfaces For Spoken Language Understanding (2021)7.16
- End-to-end Architectures For Asr-free Spoken Language Understanding (2019)8.60