Jointly Predicting Emotion, Age, And Country Using Pre-trained Acoustic Embedding
2022 Β· Bagus Tris Atmaja, Zanjabila, Akira Sasou
Abstract
In this paper, we demonstrated the benefit of using pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers, including the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and normalizations for predicting paralinguistic information from speech.
Authors
(none)
Tags
Stats
Related papers
- Self-supervision And Learnable Strfs For Age, Emotion, And Country Prediction (2022)0.00
- Burst2vec: An Adversarial Multi-task Approach For Predicting Emotion, Age, And Origin From Vocal Bursts (2022)0.00
- Advancing Audio Emotion And Intent Recognition With Large Pre-trained Models And Bayesian Inference (2023)5.24
- Speech Emotion: Investigating Model Representations, Multi-task Learning And Knowledge Distillation (2022)6.34
- Attention-augmented End-to-end Multi-task Learning For Emotion Prediction From Speech (2019)13.50
- Leveraging Speaker Embeddings With Adversarial Multi-task Learning For Age Group Classification (2023)0.00
- Leveraging Speaker Attribute Information Using Multi Task Learning For Speaker Verification And Diarization (2020)6.34
- SEGAA: A Unified Approach To Predicting Age, Gender, And Emotion In Speech (2024)0.00