Bidirectional Representations For Low Resource Spoken Language Understanding
2022 Β· Quentin Meeus, Marie-Francine Moens, Hugo van Hamme
Abstract
Most spoken language understanding systems use a pipeline approach composed of an automatic speech recognition interface and a natural language understanding module. This approach forces hard decisions when converting continuous inputs into discrete language symbols. Instead, we propose a representation model to encode speech in rich bidirectional encodings that can be used for downstream tasks such as intent prediction. The approach uses a masked language modelling objective to learn the representations, and thus benefits from both the left and right contexts. We show that the performance of the resulting encodings before fine-tuning is better than comparable models on multiple datasets, and that fine-tuning the top layers of the representation model improves the current state of the art on the Fluent Speech Command dataset, also in a low-data regime, when a limited amount of labelled data is used for training. Furthermore, we propose class attention as a spoken language understanding
Authors
(none)
Tags
Stats
Related papers
- Towards End-to-end Spoken Language Understanding (2018)14.73
- Learning Semantic Information From Raw Audio Signal Using Both Contextual And Phonetic Representations (2024)0.00
- Wav-bert: Cooperative Acoustic And Linguistic Representation Learning For Low-resource Speech Recognition (2021)8.82
- Cross-lingual Spoken Language Understanding With Regularized Representation Alignment (2020)6.77
- Wabert: A Low-resource End-to-end Model For Spoken Language Understanding And Speech-to-bert Alignment (2022)0.00
- Sequence-based Multi-lingual Low Resource Speech Recognition (2018)12.40
- From Audio To Semantics: Approaches To End-to-end Spoken Language Understanding (2018)13.23
- Large-scale Transfer Learning For Low-resource Spoken Language Understanding (2020)2.26