Frame-level Emotional State Alignment Method For Speech Emotion Recognition
2023 Β· Qifei Li, Yingming Gao, Cong Wang, et al.
Abstract
Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address this problem, we propose a frame-level emotional state alignment method for SER. First, we fine-tune HuBERT model to obtain a SER system with task-adaptive pretraining (TAPT) method, and extract embeddings from its transformer layers to form frame-level pseudo-emotion labels with clustering. Then, the pseudo labels are used to pretrain HuBERT. Hence, the each frame output of HuBERT has corresponding emotional information. Finally, we fine-tune the above pretrained HuBERT for SER by adding an attention layer on the top of it, which can focus only on those frames that are emotionally more consiste
Authors
(none)
Tags
Stats
Related papers
- Active Learning Based Fine-tuning Framework For Speech Emotion Recognition (2023)6.34
- Enhanced Speech Emotion Recognition With Efficient Channel Attention Guided Deep Cnn-bilstm Framework (2024)0.00
- Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition (2025)4.52
- Trustser: On The Trustworthiness Of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition (2023)9.07
- A Layer-anchoring Strategy For Enhancing Cross-lingual Speech Emotion Recognition (2024)0.00
- Speaker Emotion Recognition: Leveraging Self-supervised Models For Feature Extraction Using Wav2vec2 And Hubert (2024)0.00
- Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition (2023)10.97
- GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning For Speech Emotion Recognition (2024)0.00