β all papers Β· overview
Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic
Embedding
Abstract
In this paper, we demonstrated the benefit of using pre-trained model to
extract acoustic embedding to jointly predict (multitask learning) three tasks:
emotion, age, and native country. The pre-trained model was trained with
wav2vec 2.0 large robust model on the speech emotion corpus. The emotion and
age tasks were regression problems, while country prediction was a
classification task. A single harmonic mean from three metrics was used to
evaluate the performance of multitask learning. The classifier was a linear
network with two independent layers and shared layers, including the output
layers. This study explores multitask learning on different acoustic features
(including the acoustic embedding extracted from a model trained on an
affective speech dataset), seed numbers, batch sizes, and normalizations for
predicting paralinguistic information from speech.
Related papers
- Jointly Predicting Emotion, Age, And Country Using Pre-trained Acoustic Embedding (2022)6.8
- Self-supervision And Learnable Strfs For Age, Emotion, And Country Prediction (2022)0.0
- Self-supervision and Learnable STRFs for Age, Emotion, and Country
Prediction (2022)β
- Burst2vec: An Adversarial Multi-task Approach For Predicting Emotion, Age, And Origin From Vocal Bursts (2022)0.0
- Speech Emotion: Investigating Model Representations, Multi-Task Learning
and Knowledge Distillation (2022)β
- Advancing Audio Emotion And Intent Recognition With Large Pre-trained Models And Bayesian Inference (2023)5.2
- Advancing Audio Emotion and Intent Recognition with Large Pre-Trained
Models and Bayesian Inference (2023)β
- A Multi-Task, Multi-Modal Approach for Predicting Categorical and
Dimensional Emotions (2024)β