Asr-based Features For Emotion Recognition: A Transfer Learning Approach
2018 Β· NoΓ© Tits, Kevin El Haddad, Thierry Dutoit
Abstract
During the last decade, the applications of signal processing have drastically improved with deep learning. However areas of affecting computing such as emotional speech synthesis or emotion recognition from spoken language remains challenging. In this paper, we investigate the use of a neural Automatic Speech Recognition (ASR) as a feature extractor for emotion recognition. We show that these features outperform the eGeMAPS feature set to predict the valence and arousal emotional dimensions, which means that the audio-to-text mapping learning by the ASR system contain information related to the emotional dimensions in spontaneous speech. We also examine the relationship between first layers (closer to speech) and last layers (closer to text) of the ASR and valence/arousal.
Authors
(none)
Tags
Stats
Related papers
- Embedded Emotions -- A Data Driven Approach To Learn Transferable Feature Representations From Raw Speech Input For Emotion Recognition (2020)0.00
- Fusing ASR Outputs In Joint Training For Speech Emotion Recognition (2021)12.61
- A Transfer Learning Method For Speech Emotion Recognition From Automatic Speech Recognition (2020)0.00
- CTA-RNN: Channel And Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings For Speech Emotion Recognition (2022)5.84
- ASR And Emotional Speech: A Word-level Investigation Of The Mutual Impact Of Speech And Emotion Recognition (2023)8.82
- Transfer Learning For Improving Speech Emotion Classification Accuracy (2018)15.10
- Multimodal Emotion Recognition Using Transfer Learning From Speaker Recognition And Bert-based Models (2022)12.10
- Emotion Recognition From Speech (2019)0.00