Understanding Semantics From Speech Through Pre-training
2019 Β· Pengwei Wang, Liangchen Wei, Yong Cao, et al.
Abstract
End-to-end Spoken Language Understanding (SLU) is proposed to infer the semantic meaning directly from audio features without intermediate text representation. Although the acoustic model component of an end-to-end SLU system can be pre-trained with Automatic Speech Recognition (ASR) targets, the SLU component can only learn semantic features from limited task-specific training data. In this paper, for the first time we propose to do large-scale unsupervised pre-training for the SLU component of an end-to-end SLU system, so that the SLU component may preserve semantic features from massive unlabeled audio data. As the output of the acoustic model component, i.e. phoneme posterior sequences, has much different characteristic from text sequences, we propose a novel pre-training model called BERT-PLM, which stands for Bidirectional Encoder Representations from Transformers through Permutation Language Modeling. BERT-PLM trains the SLU component on unlabeled data through a regression objec
Authors
(none)
Tags
Stats
Related papers
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)9.41
- SPLAT: Speech-language Joint Pre-training For Spoken Language Understanding (2020)10.35
- Pre-training For Spoken Language Understanding With Joint Textual And Phonetic Representation Learning (2021)2.26
- Style Attuned Pre-training And Parameter Efficient Fine-tuning For Spoken Language Understanding (2020)6.77
- ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding (2020)9.59
- Recent Advances In End-to-end Spoken Language Understanding (2019)8.09
- A Study On The Integration Of Pre-trained SSL, ASR, LM And SLU Models For Spoken Language Understanding (2022)8.09
- Using Speech Synthesis To Train End-to-end Spoken Language Understanding Models (2019)9.23