How Speech Is Recognized To Be Emotional - A Study Based On Information Decomposition
2021 Β· Haoran Sun, Lantian Li, Thomas Fang Zheng, et al.
Abstract
The way that humans encode their emotion into speech signals is complex. For instance, an angry man may increase his pitch and speaking rate, and use impolite words. In this paper, we present a preliminary study on various emotional factors and investigate how each of them impacts modern emotion recognition systems. The key tool of our study is the SpeechFlow model presented recently, by which we are able to decompose speech signals into separate information factors (content, pitch, rhythm). Based on this decomposition, we carefully studied the performance of each information component and their combinations. We conducted the study on three different speech emotion corpora and chose an attention-based convolutional RNN as the emotion classifier. Our results show that rhythm is the most important component for emotional expression. Moreover, the cross-corpus results are very bad (even worse than guess), demonstrating that the present speech emotion recognition model is rather weak. Inte
Authors
(none)
Tags
Stats
Related papers
- Emotion Recognition From Speech (2019)0.00
- Real-time Speech Emotion Recognition Based On Syllable-level Feature Extraction (2022)8.09
- Identifying Speakers Using Their Emotion Cues (2018)10.85
- Analysis Of Speech Separation Performance Degradation On Emotional Speech Mixtures (2023)0.00
- Evaluating Raw Waveforms With Deep Learning Frameworks For Speech Emotion Recognition (2023)0.00
- Decoding Emotions: A Comprehensive Multilingual Study Of Speech Models For Speech Emotion Recognition (2023)0.00
- ASR And Emotional Speech: A Word-level Investigation Of The Mutual Impact Of Speech And Emotion Recognition (2023)8.82
- Towards Interpretable And Transferable Speech Emotion Recognition: Latent Representation Based Analysis Of Features, Methods And Corpora (2021)0.00