Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study On The Impact Of Input Features, Signal Length, And Acted Speech
2017 Β· Michael Neumann, Ngoc Thang Vu
Abstract
Speech emotion recognition is an important and challenging task in the realm of human-computer interaction. Prior work proposed a variety of models and feature sets for training a system. In this work, we conduct extensive experiments using an attentive convolutional neural network with multi-view learning objective function. We compare system performance using different lengths of the input signal, different types of acoustic features and different types of emotion speech (improvised/scripted). Our experimental results on the Interactive Emotional Motion Capture (IEMOCAP) database reveal that the recognition performance strongly depends on the type of speech data independent of the choice of input features. Furthermore, we achieved state-of-the-art results on the improvised speech data of IEMOCAP.
Authors
(none)
Tags
Stats
Related papers
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Light-sernet: A Lightweight Fully Convolutional Neural Network For Speech Emotion Recognition (2021)14.90
- Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation (2021)13.65
- Speech Emotion Recognition Via Contrastive Loss Under Siamese Networks (2019)12.17
- Searching For Effective Preprocessing Method And Cnn-based Architecture With Efficient Channel Attention On Speech Emotion Recognition (2024)2.26
- Deep Learning Based Emotion Recognition System Using Speech Features And Transcriptions (2019)0.00
- An Analysis Of Large Speech Models-based Representations For Speech Emotion Recognition (2023)4.52
- Evaluating Raw Waveforms With Deep Learning Frameworks For Speech Emotion Recognition (2023)0.00