Multi-window Data Augmentation Approach For Speech Emotion Recognition
2020 Β· Sarala Padi, Dinesh Manocha, Ram D. Sriram
Abstract
We present a Multi-Window Data Augmentation (MWA-SER) approach for speech emotion recognition. MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method and building the deep learning model to recognize the underlying emotion of an audio signal. Our proposed multi-window augmentation approach generates additional data samples from the speech signal by employing multiple window sizes in the audio feature extraction process. We show that our augmentation method, combined with a deep learning model, improves speech emotion recognition performance. We evaluate the performance of our approach on three benchmark datasets: IEMOCAP, SAVEE, and RAVDESS. We show that the multi-window model improves the SER performance and outperforms a single-window model. The notion of finding the best window size is an essential step in audio feature extraction. We perform extensive experimental evaluations to find the best window choice and explore the windowing
Authors
(none)
Tags
Stats
Related papers
- Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation (2021)13.65
- Sigwavnet: Learning Multiresolution Signal Wavelet Network For Speech Emotion Recognition (2025)8.48
- Hybrid Data Augmentation And Deep Attention-based Dilated Convolutional-recurrent Neural Networks For Speech Emotion Recognition (2021)12.81
- Copypaste: An Augmentation Method For Speech Emotion Recognition (2020)11.39
- Multi-task Semi-supervised Adversarial Autoencoding For Speech Emotion Recognition (2019)14.58
- Improved Speech Emotion Recognition Using Transfer Learning And Spectrogram Augmentation (2021)12.74
- Augmenting Generative Adversarial Networks For Speech Emotion Recognition (2020)10.85
- Generative Data Augmentation Guided By Triplet Loss For Speech Emotion Recognition (2022)3.58