Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, And Augmenting
2023 Β· Tiantian Feng, Shrikanth Narayanan
Abstract
Significant advances are being made in speech emotion recognition (SER) using deep learning models. Nonetheless, training SER systems remains challenging, requiring both time and costly resources. Like many other machine learning tasks, acquiring datasets for SER requires substantial data annotation efforts, including transcription and labeling. These annotation processes present challenges when attempting to scale up conventional SER systems. Recent developments in foundational models have had a tremendous impact, giving rise to applications such as ChatGPT. These models have enhanced human-computer interactions including bringing unique possibilities for streamlining data collection in fields like SER. In this research, we explore the use of foundational models to assist in automating SER from transcription and annotation to augmentation. Our study demonstrates that these models can generate transcriptions to enhance the performance of SER systems that rely solely on speech data. Fur
Authors
(none)
Tags
Stats
Related papers
- Improved Speech Emotion Recognition Using Transfer Learning And Spectrogram Augmentation (2021)12.74
- Generative Data Augmentation Guided By Triplet Loss For Speech Emotion Recognition (2022)3.58
- Generative Emotional AI For Speech Emotion Recognition: The Case For Synthetic Emotional Speech Augmentation (2023)11.19
- Towards Interpretable And Transferable Speech Emotion Recognition: Latent Representation Based Analysis Of Features, Methods And Corpora (2021)0.00
- Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition (2023)10.97
- Active Learning Based Fine-tuning Framework For Speech Emotion Recognition (2023)6.34
- Copypaste: An Augmentation Method For Speech Emotion Recognition (2020)11.39
- Improving Speech Emotion Recognition In Under-resourced Languages Via Speech-to-speech Translation With Bootstrapping Data Selection (2024)7.81