MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge In Speech Emotion Recognition
2023 Β· Haiyang Sun, Fulin Zhang, Yingying Gao, et al.
Abstract
Speech Emotion Recognition (SER) is an important research topic in human-computer interaction. Many recent works focus on directly extracting emotional cues through pre-trained knowledge, frequently overlooking considerations of appropriateness and comprehensiveness. Therefore, we propose a novel framework for pre-training knowledge in SER, called Multi-perspective Fusion Search Network (MFSN). Considering comprehensiveness, we partition speech knowledge into Textual-related Emotional Content (TEC) and Speech-related Emotional Content (SEC), capturing cues from both semantic and acoustic perspectives, and we design a new architecture search space to fully leverage them. Considering appropriateness, we verify the efficacy of different modeling approaches in capturing SEC and fills the gap in current research. Experimental results on multiple datasets demonstrate the superiority of MFSN.
Authors
(none)
Tags
Stats
Related papers
- MSF-SER: Enriching Acoustic Modeling With Multi-granularity Semantics For Speech Emotion Recognition (2025)0.00
- MF-AED-AEC: Speech Emotion Recognition By Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction (2024)0.00
- MSAC: Multiple Speech Attribute Control Method For Reliable Speech Emotion Recognition (2023)0.00
- Two-stage Dimensional Emotion Recognition By Fusing Predictions Of Acoustic And Text Networks Using SVM (2022)12.10
- Hierarchical Network With Decoupled Knowledge Distillation For Speech Emotion Recognition (2023)6.77
- Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search (2022)2.26
- Emonet: A Transfer Learning Framework For Multi-corpus Speech Emotion Recognition (2021)2.95
- Active Learning Based Fine-tuning Framework For Speech Emotion Recognition (2023)6.34