Light-sernet: A Lightweight Fully Convolutional Neural Network For Speech Emotion Recognition
2021 Β· Arya Aftab, Alireza Morsali, Shahrokh Ghaemmaghami, et al.
Abstract
Detecting emotions directly from a speech signal plays an important role in effective human-computer interactions. Existing speech emotion recognition models require massive computational and storage resources, making them hard to implement concurrently with other machine-interactive tasks in embedded systems. In this paper, we propose an efficient and lightweight fully convolutional neural network for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability. The extracted features are used to classify the emotion of the input speech segment. While our model has a smaller size than that of the state-of-the-art models, it achieves higher performance on the IEMOCAP and EMO-DB datasets.
Authors
(none)
Tags
Stats
Related papers
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study On The Impact Of Input Features, Signal Length, And Acted Speech (2017)16.14
- Focal Loss Based Residual Convolutional Neural Network For Speech Emotion Recognition (2019)0.00
- Deep Learning Based Emotion Recognition System Using Speech Features And Transcriptions (2019)0.00
- Direct Modelling Of Speech Emotion From Raw Speech (2019)14.55
- Emoformer: A Text-independent Speech Emotion Recognition Using A Hybrid Transformer-cnn Model (2025)6.34
- Emotech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information With Hybrid Recurrent Network (2025)8.35
- Emodiarize: Speaker Diarization And Emotion Identification From Speech Signals Using Convolutional Neural Networks (2023)0.00