Effect Of Attention And Self-supervised Speech Embeddings On Non-semantic Speech Tasks
2023 Β· Payal Mohapatra, Akash Pandey, Yueyuan Sui, et al.
Abstract
Human emotion understanding is pivotal in making conversational technology mainstream. We view speech emotion understanding as a perception task which is a more realistic setting. With varying contexts (languages, demographics, etc.) different share of people perceive the same speech segment as a non-unanimous emotion. As part of the ACM Multimedia 2023 Computational Paralinguistics ChallengE (ComParE) in the EMotion Share track, we leverage their rich dataset of multilingual speakers and multi-label regression target of 'emotion share' or perception of that emotion. We demonstrate that the training scheme of different foundation models dictates their effectiveness for tasks beyond speech recognition, especially for non-semantic speech tasks like emotion understanding. This is a very complex task due to multilingual speakers, variability in the target labels, and inherent imbalance in the regression dataset. Our results show that HuBERT-Large with a self-attention-based light-weight se
Authors
(none)
Tags
Stats
Related papers
- Attention-augmented End-to-end Multi-task Learning For Emotion Prediction From Speech (2019)13.50
- Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-attention Cues In Multitask Learning (2024)0.00
- Representation Learning Through Cross-modal Conditional Teacher-student Training For Speech Emotion Recognition (2021)11.19
- Speaker Emotion Recognition: Leveraging Self-supervised Models For Feature Extraction Using Wav2vec2 And Hubert (2024)0.00
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Unsupervised Representations Improve Supervised Learning In Speech Emotion Recognition (2023)0.00
- Cross-lingual Speech Emotion Recognition: Humans Vs. Self-supervised Models (2024)5.84
- Investigating Salient Representations And Label Variance In Dimensional Speech Emotion Analysis (2023)3.58