Vesper: A Compact And Effective Pretrained Model For Speech Emotion Recognition
2023 Β· Weidong Chen, Xiaofen Xing, Peihao Chen, et al.
Abstract
This paper presents a paradigm that adapts general large-scale pretrained models (PTMs) to speech emotion recognition task. Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved. Additionally, employing PTMs in practical applications can be challenging due to their considerable size. Above limitations spawn another research direction, namely, optimizing large-scale PTMs for specific tasks to generate task-specific PTMs that are both compact and effective. In this paper, we focus on the speech emotion recognition task and propose an improved emotion-specific pretrained encoder called Vesper. Vesper is pretrained on a speech dataset based on WavLM and takes into account emotional characteristics. To enhance sensitivity to emotional information, Vesper employs an emotion-guided masking strategy to identify the regions that need masking. Subsequently, Vesper emplo
Authors
(none)
Tags
Stats
Related papers
- A Comparative Study Of Pre-trained Speech And Audio Embeddings For Speech Emotion Recognition (2023)0.00
- Exploring Wav2vec 2.0 Fine-tuning For Improved Speech Emotion Recognition (2021)15.67
- Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition (2023)10.97
- Transforming The Embeddings: A Lightweight Technique For Speech Emotion Recognition Tasks (2023)7.50
- Dawn Of The Transformer Era In Speech Emotion Recognition: Closing The Valence Gap (2022)18.59
- An Analysis Of Large Speech Models-based Representations For Speech Emotion Recognition (2023)4.52
- Pre-trained Model Representations And Their Robustness Against Noise For Speech Emotion Analysis (2023)0.00
- Personalized Adaptation With Pre-trained Speech Encoders For Continuous Emotion Recognition (2023)6.34