Emodiarize: Speaker Diarization And Emotion Identification From Speech Signals Using Convolutional Neural Networks
2023 Β· Hanan Hamza, Fiza Gafoor, Fathima Sithara, et al.
Abstract
In the era of advanced artificial intelligence and human-computer interaction, identifying emotions in spoken language is paramount. This research explores the integration of deep learning techniques in speech emotion recognition, offering a comprehensive solution to the challenges associated with speaker diarization and emotion identification. It introduces a framework that combines a pre-existing speaker diarization pipeline and an emotion identification model built on a Convolutional Neural Network (CNN) to achieve higher precision. The proposed model was trained on data from five speech emotion datasets, namely, RAVDESS, CREMA-D, SAVEE, TESS, and Movie Clips, out of which the latter is a speech emotion dataset created specifically for this research. The features extracted from each sample include Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), Root Mean Square (RMS), and various data augmentation algorithms like pitch, noise, stretch, and shift. This feature e
Authors
(none)
Tags
Stats
Related papers
- Emotion Recognition From Speech (2019)0.00
- Evaluating Raw Waveforms With Deep Learning Frameworks For Speech Emotion Recognition (2023)0.00
- Emoformer: A Text-independent Speech Emotion Recognition Using A Hybrid Transformer-cnn Model (2025)6.34
- Deep Learning Based Emotion Recognition System Using Speech Features And Transcriptions (2019)0.00
- Hybrid Data Augmentation And Deep Attention-based Dilated Convolutional-recurrent Neural Networks For Speech Emotion Recognition (2021)12.81
- Emotech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information With Hybrid Recurrent Network (2025)8.35
- Multi-modal Emotion Detection With Transfer Learning (2020)0.00
- Emotion Recognition System From Speech And Visual Information Based On Convolutional Neural Networks (2020)10.21