Speech2affectivegestures: Synthesizing Co-speech Gestures With Generative Adversarial Affective Expression Learning
2021 Β· Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski, et al.
Abstract
We present a generative adversarial network to synthesize 3D pose sequences of co-speech upper-body gestures with appropriate affective expressions. Our network consists of two components: a generator to synthesize gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator to distinguish between the synthesized pose sequences and real 3D pose sequences. We leverage the Mel-frequency cepstral coefficients and the text transcript computed from the input speech in separate encoders in our generator to learn the desired sentiments and the associated affective cues. We design an affective encoder using multi-scale spatial-temporal graph convolutions to transform 3D pose sequences into latent, pose-based affective features. We use our affective encoder in both our generator, where it learns affective features from the seed poses to guide the gesture synthesis, and our discriminator, where it enforces the synthesized gestures to con
Authors
(none)
Tags
Stats
Related papers
- Emotiongesture: Audio-driven Diverse Emotional Co-speech 3D Gesture Generation (2023)10.97
- Diffusion-based Co-speech Gesture Generation Using Joint Text And Audio Representation (2023)10.07
- A Conversational Gesture Synthesis System Based On Emotions And Semantics (2025)0.00
- Modeling Feature Representations For Affective Speech Using Generative Adversarial Networks (2019)0.00
- Generative Adversarial Networks In Human Emotion Synthesis:a Review (2020)11.39
- Dim-gesture: Co-speech Gesture Generation With Adaptive Layer Normalization Mamba-2 Framework (2024)2.26
- On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks (2018)12.33
- Expgest: Expressive Speaker Generation Using Diffusion Model And Hybrid Audio-text Guidance (2024)4.52