β all papers Β· overview
Self-supervision and Learnable STRFs for Age, Emotion, and Country
Prediction
Abstract
This work presents a multitask approach to the simultaneous estimation of
age, country of origin, and emotion given vocal burst audio for the 2022 ICML
Expressive Vocalizations Challenge ExVo-MultiTask track. The method of choice
utilized a combination of spectro-temporal modulation and self-supervised
features, followed by an encoder-decoder network organized in a multitask
paradigm. We evaluate the complementarity between the tasks posed by examining
independent task-specific and joint models, and explore the relative strengths
of different feature sets. We also introduce a simple score fusion mechanism to
leverage the complementarity of different feature sets for this task.
We find that robust data preprocessing in conjunction with score fusion over
spectro-temporal receptive field and HuBERT models achieved our best
ExVo-MultiTask test score of 0.412.
Related papers
- Self-supervision And Learnable Strfs For Age, Emotion, And Country Prediction (2022)0.0
- Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic
Embedding (2022)β
- Jointly Predicting Emotion, Age, And Country Using Pre-trained Acoustic Embedding (2022)6.8
- Burst2vec: An Adversarial Multi-task Approach For Predicting Emotion, Age, And Origin From Vocal Bursts (2022)0.0
- Burst2Vec: An Adversarial Multi-Task Approach for Predicting Emotion,
Age, and Origin from Vocal Bursts (2022)β
- Comparing Supervised And Self-supervised Embedding For Exvo Multi-task Learning Track (2022)0.0
- Multitask Vocal Burst Modeling With Resnets And Pre-trained Paralinguistic Conformers (2022)0.0
- Multitask vocal burst modeling with ResNets and pre-trained
paralinguistic Conformers (2022)β