Improvement And Implementation Of A Speech Emotion Recognition Model Based On Dual-layer LSTM
2024 Β· Xiaoran Yang, Shuhan Yu, Wenxi Xu
Abstract
This paper builds upon an existing speech emotion recognition model by adding an additional LSTM layer to improve the accuracy and processing efficiency of emotion recognition from audio data. By capturing the long-term dependencies within audio sequences through a dual-layer LSTM network, the model can recognize and classify complex emotional patterns more accurately. Experiments conducted on the RAVDESS dataset validated this approach, showing that the modified dual layer LSTM model improves accuracy by 2% compared to the single-layer LSTM while significantly reducing recognition latency, thereby enhancing real-time performance. These results indicate that the dual-layer LSTM architecture is highly suitable for handling emotional features with long-term dependencies, providing a viable optimization for speech emotion recognition systems. This research provides a reference for practical applications in fields like intelligent customer service, sentiment analysis and human-computer int
Authors
(none)
Tags
Stats
Related papers
- Speech Emotion Recognition With Dual-sequence LSTM Architecture (2019)15.78
- Emotion Recognition From Speech (2019)0.00
- Multimodal Speech Emotion Recognition Using Audio And Text (2018)18.02
- Emodiarize: Speaker Diarization And Emotion Identification From Speech Signals Using Convolutional Neural Networks (2023)0.00
- Automatically Augmenting An Emotion Dataset Improves Classification Using Audio (2018)0.00
- Evaluating Raw Waveforms With Deep Learning Frameworks For Speech Emotion Recognition (2023)0.00
- Audio Visual Emotion Recognition With Temporal Alignment And Perception Attention (2016)0.00
- Extending Rnn-t-based Speech Recognition Systems With Emotion And Language Classification (2022)4.52