Ctl-mtnet: A Novel Capsnet And Transfer Learning-based Mixed Task Net For The Single-corpus And Cross-corpus Speech Emotion Recognition
2022 Β· Xin-Cheng Wen, Jia-Xin Ye, Yan Luo, et al.
Abstract
Speech Emotion Recognition (SER) has become a growing focus of research in human-computer interaction. An essential challenge in SER is to extract common attributes from different speakers or languages, especially when a specific source corpus has to be trained to recognize the unknown data coming from another speech corpus. To address this challenge, a Capsule Network (CapsNet) and Transfer Learning based Mixed Task Net (CTLMTNet) are proposed to deal with both the singlecorpus and cross-corpus SER tasks simultaneously in this paper. For the single-corpus task, the combination of Convolution-Pooling and Attention CapsNet module CPAC) is designed by embedding the self-attention mechanism to the CapsNet, guiding the module to focus on the important features that can be fed into different capsules. The extracted high-level features by CPAC provide sufficient discriminative ability. Furthermore, to handle the cross-corpus task, CTL-MTNet employs a Corpus Adaptation Adversarial Module (CAA
Authors
(none)
Tags
Stats
Related papers
- Emonet: A Transfer Learning Framework For Multi-corpus Speech Emotion Recognition (2021)2.95
- Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition (2025)4.52
- MSAC: Multiple Speech Attribute Control Method For Reliable Speech Emotion Recognition (2023)0.00
- Towards Speech Emotion Recognition "in The Wild" Using Aggregated Corpora And Deep Multi-task Learning (2017)12.87
- SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition With Speaker Embedding And Vision Transformers (2022)2.83
- CTA-RNN: Channel And Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings For Speech Emotion Recognition (2022)5.84
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Mouth Articulation-based Anchoring For Improved Cross-corpus Speech Emotion Recognition (2024)2.26