Two-stage Textual Knowledge Distillation For End-to-end Spoken Language Understanding
2020 Β· Seongbin Kim, Gyuwan Kim, Seongjin Shin, et al.
Abstract
End-to-end approaches open a new way for more accurate and efficient spoken language understanding (SLU) systems by alleviating the drawbacks of traditional pipeline systems. Previous works exploit textual information for an SLU model via pre-training with automatic speech recognition or fine-tuning with knowledge distillation. To utilize textual information more effectively, this work proposes a two-stage textual knowledge distillation method that matches utterance-level representations and predicted logits of two modalities during pre-training and fine-tuning, sequentially. We use vq-wav2vec BERT as a speech encoder because it captures general and rich features. Furthermore, we improve the performance, especially in a low-resource scenario, with data augmentation methods by randomly masking spans of discrete audio tokens and contextualized hidden representations. Consequently, we push the state-of-the-art on the Fluent Speech Commands, achieving 99.7% test accuracy in the full datase
Authors
(none)
Tags
Stats
Related papers
- I\(^2\)KD-SLU: An Intra-inter Knowledge Distillation Framework For Zero-shot Cross-lingual Spoken Language Understanding (2023)0.00
- BLSP-KD: Bootstrapping Language-speech Pre-training Via Knowledge Distillation (2024)0.00
- End-to-end Speech Translation With Knowledge Distillation (2019)0.00
- Modality Confidence Aware Training For Robust End-to-end Spoken Language Understanding (2023)2.26
- Sequence-level Knowledge Distillation For Class-incremental End-to-end Spoken Language Understanding (2023)0.00
- ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding (2020)9.59
- Adaptive Knowledge Distillation Between Text And Speech Pre-trained Models (2023)4.52
- End-to-end Spoken Language Understanding For Generalized Voice Assistants (2021)6.34