Emobox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit And Benchmark
2024 · Ziyang Ma, Mingjie Chen, Hezhao Zhang, et al.
Abstract
Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are few reasonable and universal splits of the datasets, making comparing different models and methods difficult. 2) No commonly used benchmark covers numerous corpus and languages for researchers to refer to, making reproduction a burden. In this paper, we propose EmoBox, an out-of-the-box multilingual multi-corpus speech emotion recognition toolkit, along with a benchmark for both intra-corpus and cross-corpus settings. For intra-corpus settings, we carefully designed the data partitioning for different datasets. For cross-corpus settings, we employ a foundation SER model, emotion2vec, to mitigate annotation errors and obtain a test set that is fully balanced in speakers and emotions distributions. Based on EmoBox, we present the intra-corpu
Authors
(none)
Tags
Stats
Related papers
- Speecheq: Speech Emotion Recognition Based On Multi-scale Unified Datasets And Multitask Learning (2022)5.84
- SER Evals: In-domain And Out-of-domain Benchmarking For Speech Emotion Recognition (2024)4.52
- Emonet: A Transfer Learning Framework For Multi-corpus Speech Emotion Recognition (2021)2.95
- EMOVOME: A Dataset For Emotion Recognition In Spontaneous Real-life Speech (2024)0.00
- CAMEO: Collection Of Multilingual Emotional Speech Corpora (2025)0.00
- Decoding Emotions: A Comprehensive Multilingual Study Of Speech Models For Speech Emotion Recognition (2023)0.00
- What Does It Take To Generalize SER Model Across Datasets? A Comprehensive Benchmark (2024)0.00
- Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition (2025)4.52