Comparing Supervised And Self-supervised Embedding For Exvo Multi-task Learning Track
2022 Β· Tilak Purohit, Imen Ben Mahmoud, Bogdan Vlasenko, et al.
Abstract
The ICML Expressive Vocalizations (ExVo) Multi-task challenge 2022, focuses on understanding the emotional facets of the non-linguistic vocalizations (vocal bursts (VB)). The objective of this challenge is to predict emotional intensities for VB, being a multi-task challenge it also requires to predict speakers' age and native-country. For this challenge we study and compare two distinct embedding spaces namely, self-supervised learning (SSL) based embeddings and task-specific supervised learning based embeddings. Towards that, we investigate feature representations obtained from several pre-trained SSL neural networks and task-specific supervised classification neural networks. Our studies show that the best performance is obtained with a hybrid approach, where predictions derived via both SSL and task-specific supervised learning are used. Our best system on test-set surpasses the ComPARE baseline (harmonic mean of all sub-task scores i.e., \(S_\{MTL\}\)) by a relative \(13%\) margin
Authors
(none)
Tags
Stats
Related papers
- Self-supervision And Learnable Strfs For Age, Emotion, And Country Prediction (2022)0.00
- Non-contrastive Self-supervised Learning For Utterance-level Information Extraction From Speech (2022)9.59
- Exploring The Effectiveness Of Self-supervised Learning And Classifier Chains In Emotion Recognition Of Nonverbal Vocalizations (2022)0.00
- Simultaneous Or Sequential Training? How Speech Representations Cooperate In A Multi-task Self-supervised Learning System (2023)3.58
- Multitask Vocal Burst Modeling With Resnets And Pre-trained Paralinguistic Conformers (2022)0.00
- Multi-class-token Transformer For Multitask Self-supervised Music Information Retrieval (2025)0.00
- Effect Of Attention And Self-supervised Speech Embeddings On Non-semantic Speech Tasks (2023)4.52
- Exploring Self-supervised Multi-view Contrastive Learning For Speech Emotion Recognition With Limited Annotations (2024)3.58