Emonet: A Transfer Learning Framework For Multi-corpus Speech Emotion Recognition
2021 Β· Maurice Gerczuk, Shahin Amiriparian, Sandra Ottl, et al.
Abstract
In this manuscript, the topic of multi-corpus Speech Emotion Recognition (SER) is approached from a deep transfer learning perspective. A large corpus of emotional speech data, EmoSet, is assembled from a number of existing SER corpora. In total, EmoSet contains 84181 audio recordings from 26 SER corpora with a total duration of over 65 hours. The corpus is then utilised to create a novel framework for multi-corpus speech emotion recognition, namely EmoNet. A combination of a deep ResNet architecture and residual adapters is transferred from the field of multi-domain visual recognition to multi-corpus SER on EmoSet. Compared against two suitable baselines and more traditional training and transfer settings for the ResNet, the residual adapter approach enables parameter efficient training of a multi-domain SER model on all 26 corpora. A shared model with only \(3.5\) times the number of parameters of a model trained on a single database leads to increased performance for 21 of the 26 co
Authors
(none)
Tags
Stats
Related papers
- Towards Interpretable And Transferable Speech Emotion Recognition: Latent Representation Based Analysis Of Features, Methods And Corpora (2021)0.00
- Emobox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit And Benchmark (2024)11.49
- Ctl-mtnet: A Novel Capsnet And Transfer Learning-based Mixed Task Net For The Single-corpus And Cross-corpus Speech Emotion Recognition (2022)10.21
- Continuous Metric Learning For Transferable Speech Emotion Recognition And Embedding Across Low-resource Languages (2022)0.00
- Improved Speech Emotion Recognition Using Transfer Learning And Spectrogram Augmentation (2021)12.74
- Transfer Learning For Improving Speech Emotion Classification Accuracy (2018)15.10
- Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition (2025)4.52
- Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search (2022)2.26