Direct Modelling Of Speech Emotion From Raw Speech
2019 Β· Siddique Latif, Rajib Rana, Sara Khalifa, et al.
Abstract
Speech emotion recognition is a challenging task and heavily depends on hand-engineered acoustic features, which are typically crafted to echo human perception of speech signals. However, a filter bank that is designed from perceptual evidence is not always guaranteed to be the best in a statistical modelling framework where the end goal is for example emotion classification. This has fuelled the emerging trend of learning representations from raw speech especially using deep learning neural networks. In particular, a combination of Convolution Neural Networks (CNNs) and Long Short Term Memory (LSTM) have gained great traction for the intrinsic property of LSTM in learning contextual information crucial for emotion recognition; and CNNs been used for its ability to overcome the scalability problem of regular neural networks. In this paper, we show that there are still opportunities to improve the performance of emotion recognition from the raw speech by exploiting the properties of CNN
Authors
(none)
Tags
Stats
Related papers
- Emotion Recognition From Speech (2019)0.00
- Evaluating Raw Waveforms With Deep Learning Frameworks For Speech Emotion Recognition (2023)0.00
- Emodiarize: Speaker Diarization And Emotion Identification From Speech Signals Using Convolutional Neural Networks (2023)0.00
- Light-sernet: A Lightweight Fully Convolutional Neural Network For Speech Emotion Recognition (2021)14.90
- Efficient Arabic Emotion Recognition Using Deep Neural Networks (2020)11.93
- Learning Spectro-temporal Features With 3D Cnns For Speech Emotion Recognition (2017)10.61
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- An Analysis Of Large Speech Models-based Representations For Speech Emotion Recognition (2023)4.52