Few-shot Learning In Emotion Recognition Of Spontaneous Speech Using A Siamese Neural Network With Adaptive Sample Pair Formation
2021 Β· Kexin Feng, Theodora Chaspari
Abstract
Speech-based machine learning (ML) has been heralded as a promising solution for tracking prosodic and spectrotemporal patterns in real-life that are indicative of emotional changes, providing a valuable window into one's cognitive and mental state. Yet, the scarcity of labelled data in ambulatory studies prevents the reliable training of ML models, which usually rely on "data-hungry" distribution-based learning. Leveraging the abundance of labelled speech data from acted emotions, this paper proposes a few-shot learning approach for automatically recognizing emotion in spontaneous speech from a small number of labelled samples. Few-shot learning is implemented via a metric learning approach through a siamese neural network, which models the relative distance between samples rather than relying on learning absolute patterns of the corresponding distributions of each emotion. Results indicate the feasibility of the proposed metric learning in recognizing emotions from spontaneous speech
Authors
(none)
Tags
Stats
Related papers
- Learning Spontaneity To Improve Emotion Recognition In Speech (2017)8.09
- Speech Emotion Recognition Via Contrastive Loss Under Siamese Networks (2019)12.17
- Continuous Metric Learning For Transferable Speech Emotion Recognition And Embedding Across Low-resource Languages (2022)0.00
- Exploring Speaker Enrolment For Few-shot Personalisation In Emotional Vocalisation Prediction (2022)0.00
- Dsnet: Disentangled Siamese Network With Neutral Calibration For Speech Emotion Recognition (2023)0.00
- Few Shot Speaker Recognition Using Deep Neural Networks (2019)0.00
- Real-time Speech Emotion Recognition Based On Syllable-level Feature Extraction (2022)8.09
- Advancing Multiple Instance Learning With Attention Modeling For Categorical Speech Emotion Recognition (2020)7.50