Learning Asr-robust Contextualized Embeddings For Spoken Language Understanding
2019 Β· Chao-Wei Huang, Yun-Nung Chen
Abstract
Employing pre-trained language models (LM) to extract contextualized word representations has achieved state-of-the-art performance on various NLP tasks. However, applying this technique to noisy transcripts generated by automatic speech recognizer (ASR) is concerned. Therefore, this paper focuses on making contextualized representations more ASR-robust. We propose a novel confusion-aware fine-tuning method to mitigate the impact of ASR errors to pre-trained LMs. Specifically, we fine-tune LMs to produce similar representations for acoustically confusable words that are obtained from word confusion networks (WCNs) produced by ASR. Experiments on the benchmark ATIS dataset show that the proposed method significantly improves the performance of spoken language understanding when performing on ASR transcripts. Our source code is available at https://github.com/MiuLab/SpokenVec
Authors
(none)
Tags
Stats
Code
Related papers
- Towards ASR Robust Spoken Language Understanding Through In-context Learning With Word Confusion Networks (2024)0.00
- Contrastive Learning For Improving ASR Robustness In Spoken Language Understanding (2022)6.34
- ML-LMCL: Mutual Learning And Large-margin Contrastive Learning For Improving ASR Robustness In Spoken Language Understanding (2023)0.00
- Leveraging Acoustic Contextual Representation By Audio-textual Cross-modal Learning For Conversational ASR (2022)0.00
- Attention-based Contextual Language Model Adaptation For Speech Recognition (2021)0.00
- End-to-end Speech Recognition Contextualization With Large Language Models (2023)0.00
- Accent-robust Automatic Speech Recognition Using Supervised And Unsupervised Wav2vec Embeddings (2021)0.00
- Contextualized Spoken Word Representations From Convolutional Autoencoders (2020)0.00