Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search
2022 Β· Zihan Wang, Qi Meng, Haifeng Lan, et al.
Abstract
Speech emotion recognition (SER) classifies audio into emotion categories such as Happy, Angry, Fear, Disgust and Neutral. While Speech Emotion Recognition (SER) is a common application for popular languages, it continues to be a problem for low-resourced languages, i.e., languages with no pretrained speech-to-text recognition models. This paper firstly proposes a language-specific model that extract emotional information from multiple pre-trained speech models, and then designs a multi-domain model that simultaneously performs SER for various languages. Our multidomain model employs a multi-gating mechanism to generate unique weighted feature combination for each language, and also searches for specific neural network structure for each language through a neural architecture search module. In addition, we introduce a contrastive auxiliary loss to build more separable representations for audio data. Our experiments show that our model raises the state-of-the-art accuracy by 3% for Germ
Authors
(none)
Tags
Stats
Related papers
- Decoding Emotions: A Comprehensive Multilingual Study Of Speech Models For Speech Emotion Recognition (2023)0.00
- Leveraging Cross-attention Transformer And Multi-feature Fusion For Cross-linguistic Speech Emotion Recognition (2025)4.52
- Unsupervised Adversarial Domain Adaptation For Cross-lingual Speech Emotion Recognition (2019)12.74
- Speech Emotion Recognition With Dual-sequence LSTM Architecture (2019)15.78
- Continuous Metric Learning For Transferable Speech Emotion Recognition And Embedding Across Low-resource Languages (2022)0.00
- Speecheq: Speech Emotion Recognition Based On Multi-scale Unified Datasets And Multitask Learning (2022)5.84
- Semi-supervised Cross-lingual Speech Emotion Recognition (2022)10.85
- A Layer-anchoring Strategy For Enhancing Cross-lingual Speech Emotion Recognition (2024)0.00