Emotion Invariant Speaker Embeddings For Speaker Identification With Emotional Speech
2020 Β· Biswajit Dev Sarma, Rohan Kumar Das
Abstract
Emotional state of a speaker is found to have significant effect in speech production, which can deviate speech from that arising from neutral state. This makes identifying speakers with different emotions a challenging task as generally the speaker models are trained using neutral speech. In this work, we propose to overcome this problem by creation of emotion invariant speaker embedding. We learn an extractor network that maps the test embeddings with different emotions obtained using i-vector based system to an emotion invariant space. The resultant test embeddings thus become emotion invariant and thereby compensate the mismatch between various emotional states. The studies are conducted using four different emotion classes from IEMOCAP database. We obtain an absolute improvement of 2.6% in accuracy for speaker identification studies using emotion invariant speaker embedding against average speaker model based framework with different emotions.
Authors
(none)
Tags
Stats
Related papers
- X-vectors Meet Emotions: A Study On Dependencies Between Emotion And Speaker Recognition (2020)14.23
- Identifying Speakers Using Their Emotion Cues (2018)10.85
- Revealing Emotional Clusters In Speaker Embeddings: A Contrastive Learning Strategy For Speech Emotion Recognition (2024)7.81
- Three-stage Speaker Verification Architecture In Emotional Talking Environments (2018)7.16
- Is Style All You Need? Dependencies Between Emotion And Gst-based Speaker Recognition (2022)0.00
- Vocal Style Factorization For Effective Speaker Recognition In Affective Scenarios (2023)0.00
- Emodiarize: Speaker Diarization And Emotion Identification From Speech Signals Using Convolutional Neural Networks (2023)0.00
- Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study On The Impact Of Input Features, Signal Length, And Acted Speech (2017)16.14