Are Paralinguistic Representations All That Is Needed For Speech Emotion Recognition?
2024 Β· Orchid Chetia Phukan, Gautam Siddharth Kashyap, Arun Balaji Buduru, et al.
Abstract
Availability of representations from pre-trained models (PTMs) have facilitated substantial progress in speech emotion recognition (SER). Particularly, representations from PTM trained for paralinguistic speech processing have shown state-of-the-art (SOTA) performance for SER. However, such paralinguistic PTM representations haven't been evaluated for SER in linguistic environments other than English. Also, paralinguistic PTM representations haven't been investigated in benchmarks such as SUPERB, EMO-SUPERB, ML-SUPERB for SER. This makes it difficult to access the efficacy of paralinguistic PTM representations for SER in multiple languages. To fill this gap, we perform a comprehensive comparative study of five SOTA PTM representations. Our results shows that paralinguistic PTM (TRILLsson) representations performs the best and this performance can be attributed to its effectiveness in capturing pitch, tone and other speech characteristics more effectively than other PTM representations.
Authors
(none)
Tags
Stats
Related papers
- Decoding Emotions: A Comprehensive Multilingual Study Of Speech Models For Speech Emotion Recognition (2023)0.00
- A Comparative Study Of Pre-trained Speech And Audio Embeddings For Speech Emotion Recognition (2023)0.00
- Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition (2023)10.97
- Pre-trained Model Representations And Their Robustness Against Noise For Speech Emotion Analysis (2023)0.00
- Transforming The Embeddings: A Lightweight Technique For Speech Emotion Recognition Tasks (2023)7.50
- Towards Interpretable And Transferable Speech Emotion Recognition: Latent Representation Based Analysis Of Features, Methods And Corpora (2021)0.00
- Multilingual Speech Emotion Recognition With Multi-gating Mechanism And Neural Architecture Search (2022)2.26
- Cross-lingual Speech Emotion Recognition: Humans Vs. Self-supervised Models (2024)5.84