Spoken Language Intent Detection Using Confusion2vec
2019 Β· Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis Georgiou
Abstract
Decoding speaker's intent is a crucial part of spoken language understanding (SLU). The presence of noise or errors in the text transcriptions, in real life scenarios make the task more challenging. In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems. We propose to employ confusion2vec word feature representation to compensate for the errors made by ASR and to increase the robustness of the SLU system. The confusion2vec, motivated from human speech production and perception, models acoustic relationships between words in addition to the semantic and syntactic relations of words in human language. We hypothesize that ASR often makes errors relating to acoustically similar words, and the confusion2vec with inherent model of acoustic relationships between words is able to compensate for the errors. We demonstrate through experiments on the ATIS benchmark dataset, the robustness of the proposed model to
Authors
(none)
Tags
Stats
Related papers
- Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations With Subwords (2021)2.26
- Towards ASR Robust Spoken Language Understanding Through In-context Learning With Word Confusion Networks (2024)0.00
- Learning Asr-robust Contextualized Embeddings For Spoken Language Understanding (2019)12.02
- Modality Confidence Aware Training For Robust End-to-end Spoken Language Understanding (2023)2.26
- ASR Error Management For Improving Spoken Language Understanding (2017)9.92
- Joint Online Spoken Language Understanding And Language Modeling With Recurrent Neural Networks (2016)13.28
- VAIS ASR: Building A Conversational Speech Recognition System Using Language Model Combination (2019)0.00
- Building Robust Spoken Language Understanding By Cross Attention Between Phoneme Sequence And ASR Hypothesis (2022)2.26