Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
2025 Β· Jiliang Hu, Zuchao Li, Mengjia Shen, et al.
Abstract
Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is not suitable for simultaneous speech recognition and understanding. In this paper, we propose a joint speech recognition and structure learning framework (JSRSL), an end-to-end SLU model based on span, which can accurately transcribe speech and extract structured content simultaneously. We conduct experiments on name entity recognition and intent classification using the Chinese dataset AISHELL-NER and the English dataset SLURP. The results show that our proposed method not only outperforms the traditional sequence-to-sequence method in both transcription and extraction capabilities but also achieves state-of-the-art performance on the two datasets.
Authors
(none)
Tags
Stats
Related papers
- Unislu: Unified Spoken Language Understanding From Heterogeneous Cross-task Datasets (2025)0.00
- Integrating Pretrained ASR And LM To Perform Sequence Generation For Spoken Language Understanding (2023)5.24
- Joint Online Spoken Language Understanding And Language Modeling With Recurrent Neural Networks (2016)13.28
- End-to-end Architectures For Asr-free Spoken Language Understanding (2019)8.60
- On Joint Training With Interfaces For Spoken Language Understanding (2021)7.16
- Speech To Semantics: Improve ASR And NLU Jointly Via All-neural Interfaces (2020)9.03
- Pre-training For Spoken Language Understanding With Joint Textual And Phonetic Representation Learning (2021)2.26
- Recent Advances In End-to-end Spoken Language Understanding (2019)8.09