Multimodal Audio-textual Architecture For Robust Spoken Language Understanding
2023 Β· Anderson R. Avila, Mehdi Rezagholizadeh, Chao Xing
Abstract
Recent voice assistants are usually based on the cascade spoken language understanding (SLU) solution, which consists of an automatic speech recognition (ASR) engine and a natural language understanding (NLU) system. Because such approach relies on the ASR output, it often suffers from the so-called ASR error propagation. In this work, we investigate impacts of this ASR error propagation on state-of-the-art NLU systems based on pre-trained language models (PLM), such as BERT and RoBERTa. Moreover, a multimodal language understanding (MLU) module is proposed to mitigate SLU performance degradation caused by errors present in the ASR transcript. The MLU benefits from self-supervised features learned from both audio and text modalities, specifically Wav2Vec for speech and Bert/RoBERTa for language. Our MLU combines an encoder network to embed the audio signal and a text encoder to process text transcripts followed by a late fusion layer to fuse audio and text logits. We found that the pro
Authors
(none)
Tags
Stats
Related papers
- Modality Confidence Aware Training For Robust End-to-end Spoken Language Understanding (2023)2.26
- ML-LMCL: Mutual Learning And Large-margin Contrastive Learning For Improving ASR Robustness In Spoken Language Understanding (2023)0.00
- Building Robust Spoken Language Understanding By Cross Attention Between Phoneme Sequence And ASR Hypothesis (2022)2.26
- Towards ASR Robust Spoken Language Understanding Through In-context Learning With Word Confusion Networks (2024)0.00
- Effectiveness Of Text, Acoustic, And Lattice-based Representations In Spoken Language Understanding Tasks (2022)2.26
- Speech To Semantics: Improve ASR And NLU Jointly Via All-neural Interfaces (2020)9.03
- Unislu: Unified Spoken Language Understanding From Heterogeneous Cross-task Datasets (2025)0.00
- Tie Your Embeddings Down: Cross-modal Latent Spaces For End-to-end Spoken Language Understanding (2020)9.03