Evaluating Gammatone Frequency Cepstral Coefficients With Neural Networks For Emotion Recognition From Speech
2018 Β· Gabrielle K. Liu
Abstract
Current approaches to speech emotion recognition focus on speech features that can capture the emotional content of a speech signal. Mel Frequency Cepstral Coefficients (MFCCs) are one of the most commonly used representations for audio speech recognition and classification. This paper proposes Gammatone Frequency Cepstral Coefficients (GFCCs) as a potentially better representation of speech signals for emotion recognition. The effectiveness of MFCC and GFCC representations are compared and evaluated over emotion and intensity classification tasks with fully connected and recurrent neural network architectures. The results provide evidence that GFCCs outperform MFCCs in speech emotion recognition.
Authors
(none)
Tags
Stats
Related papers
- Focal Loss Based Residual Convolutional Neural Network For Speech Emotion Recognition (2019)0.00
- DNN-HMM Based Speaker Adaptive Emotion Recognition Using Proposed Epoch And MFCC Features (2018)14.11
- Emotech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information With Hybrid Recurrent Network (2025)8.35
- Deep Learning Based Emotion Recognition System Using Speech Features And Transcriptions (2019)0.00
- Emotion Recognition From Speech (2019)0.00
- Emotion Recognition From Speech With Recurrent Neural Networks (2017)0.00
- Light-sernet: A Lightweight Fully Convolutional Neural Network For Speech Emotion Recognition (2021)14.90
- Pitch-synchronous Single Frequency Filtering Spectrogram For Speech Emotion Recognition (2019)11.19