Advancing Multiple Instance Learning With Attention Modeling For Categorical Speech Emotion Recognition
2020 Β· Shuiyang Mao, P. C. Ching, C. -C. Jay Kuo, et al.
Abstract
Categorical speech emotion recognition is typically performed as a sequence-to-label problem, i.e., to determine the discrete emotion label of the input utterance as a whole. One of the main challenges in practice is that most of the existing emotion corpora do not give ground truth labels for each segment; instead, we only have labels for whole utterances. To extract segment-level emotional information from such weakly labeled emotion corpora, we propose using multiple instance learning (MIL) to learn segment embeddings in a weakly supervised manner. Also, for a sufficiently long utterance, not all of the segments contain relevant emotional information. In this regard, three attention-based neural network models are then applied to the learned segment embeddings to attend the most salient part of a speech utterance. Experiments on the CASIA corpus and the IEMOCAP database show better or highly competitive results than other state-of-the-art approaches.
Authors
(none)
Tags
Stats
Related papers
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Attention-augmented End-to-end Multi-task Learning For Emotion Prediction From Speech (2019)13.50
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)15.22
- Speech Emotion Recognition Using Multi-hop Attention Mechanism (2019)14.58
- Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation (2021)13.65
- Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-attention Cues In Multitask Learning (2024)0.00
- Conversational Emotion Analysis Via Attention Mechanisms (2019)10.35
- Effect Of Attention And Self-supervised Speech Embeddings On Non-semantic Speech Tasks (2023)4.52