Improving End-to-end SLU Performance With Prosodic Attention And Distillation
2023 Β· Shangeth Rajaa
Abstract
Most End-to-End SLU methods depend on the pretrained ASR or language model features for intent prediction. However, other essential information in speech, such as prosody, is often ignored. Recent research has shown improved results in classifying dialogue acts by incorporating prosodic information. The margins of improvement in these methods are minimal as the neural models ignore prosodic features. In this work, we propose prosody-attention, which uses the prosodic features differently to generate attention maps across time frames of the utterance. Then we propose prosody-distillation to explicitly learn the prosodic information in the acoustic encoder rather than concatenating the implicit prosodic features. Both the proposed methods improve the baseline results, and the prosody-distillation method gives an intent classification accuracy improvement of 8% and 2% on SLURP and STOP datasets over the prosody baseline.
Authors
(none)
Tags
Stats
Related papers
- Prodeliberation: Parallel Robust Deliberation For End-to-end Spoken Language Understanding (2024)0.00
- Integrating Pretrained ASR And LM To Perform Sequence Generation For Spoken Language Understanding (2023)5.24
- AFD-SLU: Adaptive Feature Distillation For Spoken Language Understanding (2025)0.00
- Attentive Contextual Carryover For Multi-turn End-to-end Spoken Language Understanding (2021)7.16
- Improving End-to-end Models For Set Prediction In Spoken Language Understanding (2022)0.00
- Modality Confidence Aware Training For Robust End-to-end Spoken Language Understanding (2023)2.26
- Dynamic Time-aware Attention To Speaker Roles And Contexts For Spoken Language Understanding (2017)8.35
- End-to-end Architectures For Asr-free Spoken Language Understanding (2019)8.60