Attention-based Region Of Interest (ROI) Detection For Speech Emotion Recognition
2022 · Jay Desai, Houwei Cao, Ravi Shah
Abstract
Automatic emotion recognition for real-life appli-cations is a challenging task. Human emotion expressions aresubtle, and can be conveyed by a combination of several emo-tions. In most existing emotion recognition studies, each audioutterance/video clip is labelled/classified in its entirety. However,utterance/clip-level labelling and classification can be too coarseto capture the subtle intra-utterance/clip temporal dynamics. Forexample, an utterance/video clip usually contains only a fewemotion-salient regions and many emotionless regions. In thisstudy, we propose to use attention mechanism in deep recurrentneural networks to detection the Regions-of-Interest (ROI) thatare more emotionally salient in human emotional speech/video,and further estimate the temporal emotion dynamics by aggre-gating those emotionally salient regions-of-interest. We comparethe ROI from audio and video and analyse them. We comparethe performance of the proposed attention networks with thestate-of-the-art LS
Authors
(none)
Tags
Stats
Related papers
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Audio Visual Emotion Recognition With Temporal Alignment And Perception Attention (2016)0.00
- Emotion Recognition From Speech (2019)0.00
- Recursive Joint Attention For Audio-visual Fusion In Regression Based Emotion Recognition (2023)9.59
- Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation (2021)13.65
- Better Spanish Emotion Recognition In-the-wild: Bringing Attention To Deep Spectrum Voice Analysis (2024)0.00
- CTA-RNN: Channel And Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings For Speech Emotion Recognition (2022)5.84
- Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study On The Impact Of Input Features, Signal Length, And Acted Speech (2017)16.14