Focal Loss Based Residual Convolutional Neural Network For Speech Emotion Recognition
2019 Β· Suraj Tripathi, Abhay Kumar, Abhiram Ramesh, et al.
Abstract
This paper proposes a Residual Convolutional Neural Network (ResNet) based on speech features and trained under Focal Loss to recognize emotion in speech. Speech features such as Spectrogram and Mel-frequency Cepstral Coefficients (MFCCs) have shown the ability to characterize emotion better than just plain text. Further Focal Loss, first used in One-Stage Object Detectors, has shown the ability to focus the training process more towards hard-examples and down-weight the loss assigned to well-classified examples, thus preventing the model from being overwhelmed by easily classifiable examples.
Authors
(none)
Tags
Stats
Related papers
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Learning Discriminative Features Using Center Loss And Reconstruction As Regularizer For Speech Emotion Recognition (2019)0.00
- Light-sernet: A Lightweight Fully Convolutional Neural Network For Speech Emotion Recognition (2021)14.90
- Deep Residual Local Feature Learning For Speech Emotion Recognition (2020)7.16
- Emotion Recognition From Speech With Recurrent Neural Networks (2017)0.00
- Deep Learning Based Emotion Recognition System Using Speech Features And Transcriptions (2019)0.00
- Emotion Recognition System From Speech And Visual Information Based On Convolutional Neural Networks (2020)10.21
- Evaluating Gammatone Frequency Cepstral Coefficients With Neural Networks For Emotion Recognition From Speech (2018)0.00