Copypaste: An Augmentation Method For Speech Emotion Recognition
2020 · Raghavendra Pappagari, Jesús Villalba, Piotr Żelasko, et al.
Abstract
Data augmentation is a widely used strategy for training robust machine learning models. It partially alleviates the problem of limited data for tasks like speech emotion recognition (SER), where collecting data is expensive and challenging. This study proposes CopyPaste, a perceptually motivated novel augmentation procedure for SER. Assuming that the presence of emotions other than neutral dictates a speaker's overall perceived emotion in a recording, concatenation of an emotional (emotion E) and a neutral utterance can still be labeled with emotion E. We hypothesize that SER performance can be improved using these concatenated utterances in model training. To verify this, three CopyPaste schemes are tested on two deep learning models: one trained independently and another using transfer learning from an x-vector model, a speaker recognition model. We observed that all three CopyPaste schemes improve SER performance on all the three datasets considered: MSP-Podcast, Crema-D, and IEMOC
Authors
(none)
Tags
Stats
Related papers
- Augmenting Generative Adversarial Networks For Speech Emotion Recognition (2020)10.85
- A Preliminary Study On Augmenting Speech Emotion Recognition Using A Diffusion Model (2023)0.00
- Generative Data Augmentation Guided By Triplet Loss For Speech Emotion Recognition (2022)3.58
- Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, And Augmenting (2023)0.00
- Generative Emotional AI For Speech Emotion Recognition: The Case For Synthetic Emotional Speech Augmentation (2023)11.19
- Improving Speech Emotion Recognition With Unsupervised Speaking Style Transfer (2022)6.34
- Improved Speech Emotion Recognition Using Transfer Learning And Spectrogram Augmentation (2021)12.74
- Automatically Augmenting An Emotion Dataset Improves Classification Using Audio (2018)0.00