Sentence Embedder Guided Utterance Encoder (SEGUE) For Spoken Language Understanding
2023 Β· Yi Xuan Tan, Navonil Majumder, Soujanya Poria
Abstract
The pre-trained speech encoder wav2vec 2.0 performs very well on various spoken language understanding (SLU) tasks. However, on many tasks, it trails behind text encoders with textual input. To improve the understanding capability of SLU encoders, various studies have used knowledge distillation to transfer knowledge from natural language understanding (NLU) encoders. We use a very simple method of distilling from a textual sentence embedder directly into wav2vec 2.0 as pre-training, utilizing paired audio-text datasets. We observed that this method is indeed capable of improving SLU task performance in fine-tuned settings, as well as full-data and few-shot transfer on a frozen encoder. However, the model performs worse on certain tasks highlighting the strengths and weaknesses of our approach.
Authors
(none)
Tags
Stats
Related papers
- Two-stage Textual Knowledge Distillation For End-to-end Spoken Language Understanding (2020)9.41
- Bootstrapping Meaning Through Listening: Unsupervised Learning Of Spoken Sentence Embeddings (2022)2.26
- Learning Word Embeddings From Speech (2017)0.00
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)9.41
- Speech2vec: A Sequence-to-sequence Framework For Learning Word Embeddings From Speech (2018)14.15
- Multi-task RNN-T With Semantic Decoder For Streamable Spoken Language Understanding (2022)4.52
- Diffv2s: Diffusion-based Video-to-speech Synthesis With Vision-guided Speaker Embedding (2023)8.82
- Segmental Audio Word2vec: Representing Utterances As Sequences Of Vectors With Applications In Spoken Term Detection (2018)11.08