Searching For Effective Preprocessing Method And Cnn-based Architecture With Efficient Channel Attention On Speech Emotion Recognition
2024 Β· Byunggun Kim, Younghun Kwon
Abstract
Speech emotion recognition (SER) classifies human emotions in speech with a computer model. Recently, performance in SER has steadily increased as deep learning techniques have adapted. However, unlike many domains that use speech data, data for training in the SER model is insufficient. This causes overfitting of training of the neural network, resulting in performance degradation. In fact, successful emotion recognition requires an effective preprocessing method and a model structure that efficiently uses the number of weight parameters. In this study, we propose using eight dataset versions with different frequency-time resolutions to search for an effective emotional speech preprocessing method. We propose a 6-layer convolutional neural network (CNN) model with efficient channel attention (ECA) to pursue an efficient model structure. In particular, the well-positioned ECA blocks can improve channel feature representation with only a few parameters. With the interactive emotional dy
Authors
(none)
Tags
Stats
Related papers
- Enhanced Speech Emotion Recognition With Efficient Channel Attention Guided Deep Cnn-bilstm Framework (2024)0.00
- Speech Emotion Recognition Via Cnn-transformer And Multidimensional Attention Mechanism (2024)0.00
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- CTA-RNN: Channel And Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings For Speech Emotion Recognition (2022)5.84
- Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation (2021)13.65
- Unsupervised Representations Improve Supervised Learning In Speech Emotion Recognition (2023)0.00
- Emodiarize: Speaker Diarization And Emotion Identification From Speech Signals Using Convolutional Neural Networks (2023)0.00
- Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study On The Impact Of Input Features, Signal Length, And Acted Speech (2017)16.14