Emotion-aware Speech Self-supervised Representation Learning With Intensity Knowledge
2024 Β· Rui Liu, Zening Ma
Abstract
Speech Self-Supervised Learning (SSL) has demonstrated considerable efficacy in various downstream tasks. Nevertheless, prevailing self-supervised models often overlook the incorporation of emotion-related prior information, thereby neglecting the potential enhancement of emotion task comprehension through emotion prior knowledge in speech. In this paper, we propose an emotion-aware speech representation learning with intensity knowledge. Specifically, we extract frame-level emotion intensities using an established speech-emotion understanding model. Subsequently, we propose a novel emotional masking strategy (EMS) to incorporate emotion intensities into the masking process. We selected two representative models based on Transformer and CNN, namely MockingJay and Non-autoregressive Predictive Coding (NPC), and conducted experiments on IEMOCAP dataset. Experiments have demonstrated that the representations derived from our proposed method outperform the original model in SER task.
Authors
(none)
Tags
Stats
Related papers
- End-to-end Integration Of Speech Emotion Recognition With Voice Activity Detection Using Self-supervised Learning Features (2024)0.00
- Exploring Self-supervised Multi-view Contrastive Learning For Speech Emotion Recognition With Limited Annotations (2024)3.58
- Non-contrastive Self-supervised Learning For Utterance-level Information Extraction From Speech (2022)9.59
- The Efficacy Of Self-supervised Speech Models For Audio Representations (2022)0.00
- Leveraging Semantic Information For Efficient Self-supervised Emotion Recognition With Audio-textual Distilled Models (2023)6.34
- Enhancing Speech Emotion Recognition Through Segmental Average Pooling Of Self-supervised Learning Features (2024)4.52
- Unsupervised Representations Improve Supervised Learning In Speech Emotion Recognition (2023)0.00
- Jointly Fine-tuning "bert-like" Self Supervised Models To Improve Multimodal Speech Emotion Recognition (2020)13.74