Attention-augmented End-to-end Multi-task Learning For Emotion Prediction From Speech
2019 Β· Zixing Zhang, Bingwen Wu, Bjoern Schuller
Abstract
Despite the increasing research interest in end-to-end learning systems for speech emotion recognition, conventional systems either suffer from the overfitting due in part to the limited training data, or do not explicitly consider the different contributions of automatically learnt representations for a specific task. In this contribution, we propose a novel end-to-end framework which is enhanced by learning other auxiliary tasks and an attention mechanism. That is, we jointly train an end-to-end network with several different but related emotion prediction tasks, i.e., arousal, valence, and dominance predictions, to extract more robust representations shared among various tasks than traditional systems with the hope that it is able to relieve the overfitting problem. Meanwhile, an attention layer is implemented on top of the layers for each task, with the aim to capture the contribution distribution of different segment parts for each individual task. To evaluate the effectiveness of
Authors
(none)
Tags
Stats
Related papers
- Multi-task Semi-supervised Adversarial Autoencoding For Speech Emotion Recognition (2019)14.58
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00
- Effect Of Attention And Self-supervised Speech Embeddings On Non-semantic Speech Tasks (2023)4.52
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Speech Emotion Recognition Using Multi-hop Attention Mechanism (2019)14.58
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)20.43
- MMER: Multimodal Multi-task Learning For Speech Emotion Recognition (2022)10.07
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)15.22