Automatically Augmenting An Emotion Dataset Improves Classification Using Audio
2018 Β· Egor Lakomkin, Cornelius Weber, Stefan Wermter
Abstract
In this work, we tackle a problem of speech emotion classification. One of the issues in the area of affective computation is that the amount of annotated data is very limited. On the other hand, the number of ways that the same emotion can be expressed verbally is enormous due to variability between speakers. This is one of the factors that limits performance and generalization. We propose a simple method that extracts audio samples from movies using textual sentiment analysis. As a result, it is possible to automatically construct a larger dataset of audio samples with positive, negative emotional and neutral speech. We show that pretraining recurrent neural network on such a dataset yields better results on the challenging EmotiW corpus. This experiment shows a potential benefit of combining textual sentiment analysis with vocal information.
Authors
(none)
Tags
Stats
Related papers
- Audio-guided Fusion Techniques For Multimodal Emotion Analysis (2024)4.52
- Generative Emotional AI For Speech Emotion Recognition: The Case For Synthetic Emotional Speech Augmentation (2023)11.19
- Emotioncaps: Enhancing Audio Captioning Through Emotion-augmented Data Generation (2024)0.00
- Unsupervised Representations Improve Supervised Learning In Speech Emotion Recognition (2023)0.00
- Emospeech: A Corpus Of Emotionally Rich And Contextually Detailed Speech Annotations (2024)0.00
- Improving Speech Emotion Recognition With Mutual Information Regularized Generative Model (2025)0.00
- Copypaste: An Augmentation Method For Speech Emotion Recognition (2020)11.39
- Learning Representations Of Emotional Speech With Deep Convolutional Generative Adversarial Networks (2017)0.00