Speecheq: Speech Emotion Recognition Based On Multi-scale Unified Datasets And Multitask Learning
2022 Β· Zuheng Kang, Junqing Peng, Jianzong Wang, et al.
Abstract
Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified standard. In this paper, we propose SpeechEQ, a framework for unifying SER tasks based on a multi-scale unified metric. This metric can be trained by Multitask Learning (MTL), which includes two emotion recognition tasks of Emotion States Category (EIS) and Emotion Intensity Scale (EIS), and two auxiliary tasks of phoneme recognition and gender recognition. For this framework, we build a Mandarin SER dataset - SpeechEQ Dataset (SEQD). We conducted experiments on the public CASIA and ESD datasets in Mandarin, which exhibit that our method outperforms baseline methods by a relatively large margin, yielding 8.0% and 6.5% improvement in accuracy respectively. Additional experiments on IEMOCAP with four emotion categories (i.e., angry, happy, sad, and neutral) also show the proposed method achieves a state-of-the-art of both weighted accuracy (WA) of 78.16% and
Authors
(none)
Tags
Stats
Related papers
- Emobox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit And Benchmark (2024)11.49
- Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search (2022)2.26
- MSF-SER: Enriching Acoustic Modeling With Multi-granularity Semantics For Speech Emotion Recognition (2025)0.00
- What Does It Take To Generalize SER Model Across Datasets? A Comprehensive Benchmark (2024)0.00
- Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition (2025)4.52
- SER Evals: In-domain And Out-of-domain Benchmarking For Speech Emotion Recognition (2024)4.52
- Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation (2021)13.65
- Towards Speech Emotion Recognition "in The Wild" Using Aggregated Corpora And Deep Multi-task Learning (2017)12.87