Learning Spectro-temporal Features With 3D Cnns For Speech Emotion Recognition
2017 Β· Jaebok Kim, Khiet P. Truong, Gwenn Englebienne, et al.
Abstract
In this paper, we propose to use deep 3-dimensional convolutional networks (3D CNNs) in order to address the challenge of modelling spectro-temporal dynamics for speech emotion recognition (SER). Compared to a hybrid of Convolutional Neural Network and Long-Short-Term-Memory (CNN-LSTM), our proposed 3D CNNs simultaneously extract short-term and long-term spectral features with a moderate number of parameters. We evaluated our proposed and other state-of-the-art methods in a speaker-independent manner using aggregated corpora that give a large and diverse set of speakers. We found that 1) shallow temporal and moderately deep spectral kernels of a homogeneous architecture are optimal for the task; and 2) our 3D CNNs are more effective for spectro-temporal feature learning compared to other methods. Finally, we visualised the feature space obtained with our proposed method using t-distributed stochastic neighbour embedding (T-SNE) and could observe distinct clusters of emotions.
Authors
(none)
Tags
Stats
Related papers
- Learning Discriminative Features Using Center Loss And Reconstruction As Regularizer For Speech Emotion Recognition (2019)0.00
- Direct Modelling Of Speech Emotion From Raw Speech (2019)14.55
- Searching For Effective Preprocessing Method And Cnn-based Architecture With Efficient Channel Attention On Speech Emotion Recognition (2024)2.26
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Emodiarize: Speaker Diarization And Emotion Identification From Speech Signals Using Convolutional Neural Networks (2023)0.00
- Speech Emotion Recognition Via Cnn-transformer And Multidimensional Attention Mechanism (2024)0.00
- CTA-RNN: Channel And Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings For Speech Emotion Recognition (2022)5.84
- Capturing Spectral And Long-term Contextual Information For Speech Emotion Recognition Using Deep Learning Techniques (2023)0.00