"I Have Vxxx Bxx Connexxxn!": Facing Packet Loss In Deep Speech Emotion Recognition
2020 · Mostafa M. Mohamed, Björn W. Schuller
Abstract
In applications that use emotion recognition via speech, frame-loss can be a severe issue given manifold applications, where the audio stream loses some data frames, for a variety of reasons like low bandwidth. In this contribution, we investigate for the first time the effects of frame-loss on the performance of emotion recognition via speech. Reproducible extensive experiments are reported on the popular RECOLA corpus using a state-of-the-art end-to-end deep neural network, which mainly consists of convolution blocks and recurrent layers. A simple environment based on a Markov Chain model is used to model the loss mechanism based on two main parameters. We explore matched, mismatched, and multi-condition training settings. As one expects, the matched setting yields the best performance, while the mismatched yields the lowest. Furthermore, frame-loss as a data augmentation technique is introduced as a general-purpose strategy to overcome the effects of frame-loss. It can be used durin
Authors
(none)
Tags
Stats
Related papers
- Concealnet: An End-to-end Neural Network For Packet Loss Concealment In Deep Speech Emotion Recognition (2020)0.00
- Speech Emotion Recognition Via Contrastive Loss Under Siamese Networks (2019)12.17
- Focal Loss Based Residual Convolutional Neural Network For Speech Emotion Recognition (2019)0.00
- A Consolidated View Of Loss Functions For Supervised Deep Learning-based Speech Enhancement (2020)13.93
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Robust Audio-visual Target Speaker Extraction With Emotion-aware Multiple Enrollment Fusion (2025)0.00
- Transfer Learning For Improving Speech Emotion Classification Accuracy (2018)15.10
- Real-time Speech Emotion Recognition Based On Syllable-level Feature Extraction (2022)8.09