DNN-HMM Based Speaker Adaptive Emotion Recognition Using Proposed Epoch And MFCC Features
2018 Β· Md. Shah Fahad, Jainath Yadav, Gyadhar Pradhan, et al.
Abstract
Speech is produced when time varying vocal tract system is excited with time varying excitation source. Therefore, the information present in a speech such as message, emotion, language, speaker is due to the combined effect of both excitation source and vocal tract system. However, there is very less utilization of excitation source features to recognize emotion. In our earlier work, we have proposed a novel method to extract glottal closure instants (GCIs) known as epochs. In this paper, we have explored epoch features namely instantaneous pitch, phase and strength of epochs for discriminating emotions. We have combined the excitation source features and the well known Male-frequency cepstral coefficient (MFCC) features to develop an emotion recognition system with improved performance. DNN-HMM speaker adaptive models have been developed using MFCC, epoch and combined features. IEMOCAP emotional database has been used to evaluate the models. The average accuracy for emotion recogniti
Authors
(none)
Tags
Stats
Related papers
- Deep Learning Based Emotion Recognition System Using Speech Features And Transcriptions (2019)0.00
- Emodiarize: Speaker Diarization And Emotion Identification From Speech Signals Using Convolutional Neural Networks (2023)0.00
- Novel Cascaded Gaussian Mixture Model-deep Neural Network Classifier For Speaker Identification In Emotional Talking Environments (2018)12.74
- Multi-channel Auto-encoder For Speech Emotion Recognition (2018)0.00
- Identifying Speakers Using Their Emotion Cues (2018)10.85
- A Transfer Learning Method For Speech Emotion Recognition From Automatic Speech Recognition (2020)0.00
- Evaluating Gammatone Frequency Cepstral Coefficients With Neural Networks For Emotion Recognition From Speech (2018)0.00
- Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study On The Impact Of Input Features, Signal Length, And Acted Speech (2017)16.14