Speech Emotion Recognition With Global-aware Fusion On Multi-scale Feature Representation
2022 Β· Wenjing Zhu, Xiang Li
Abstract
Speech Emotion Recognition (SER) is a fundamental task to predict the emotion label from speech data. Recent works mostly focus on using convolutional neural networks~(CNNs) to learn local attention map on fixed-scale feature representation by viewing time-varied spectral features as images. However, rich emotional feature at different scales and important global information are not able to be well captured due to the limits of existing CNNs for SER. In this paper, we propose a novel GLobal-Aware Multi-scale (GLAM) neural network (The code is available at https://github.com/lixiangucas01/GLAM) to learn multi-scale feature representation with global-aware fusion module to attend emotional information. Specifically, GLAM iteratively utilizes multiple convolutional kernels with different scales to learn multiple feature representation. Then, instead of using attention-based methods, a simple but effective global-aware fusion module is applied to grab most important emotional information g
Authors
(none)
Tags
Stats
Code
Related papers
- Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation (2021)13.65
- Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search (2022)2.26
- Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition (2025)4.52
- Speech Emotion Recognition Via Cnn-transformer And Multidimensional Attention Mechanism (2024)0.00
- Learning Local To Global Feature Aggregation For Speech Emotion Recognition (2023)8.09
- Enhanced Speech Emotion Recognition With Efficient Channel Attention Guided Deep Cnn-bilstm Framework (2024)0.00
- Deep Residual Local Feature Learning For Speech Emotion Recognition (2020)7.16
- Multistage Linguistic Conditioning Of Convolutional Layers For Speech Emotion Recognition (2021)9.23