Emotiongesture: Audio-driven Diverse Emotional Co-speech 3D Gesture Generation
2023 Β· Xingqun Qi, Chen Liu, Lincheng Li, et al.
Abstract
Generating vivid and diverse 3D co-speech gestures is crucial for various applications in animating virtual avatars. While most existing methods can generate gestures from audio directly, they usually overlook that emotion is one of the key factors of authentic co-speech gesture generation. In this work, we propose EmotionGesture, a novel framework for synthesizing vivid and diverse emotional co-speech 3D gestures from audio. Considering emotion is often entangled with the rhythmic beat in speech audio, we first develop an Emotion-Beat Mining module (EBM) to extract the emotion and audio beat features as well as model their correlation via a transcript-based visual-rhythm alignment. Then, we propose an initial pose based Spatial-Temporal Prompter (STP) to generate future gestures from the given initial poses. STP effectively models the spatial-temporal correlations between the initial poses and the future gestures, thus producing the spatial-temporal coherent pose prompt. Once we obtai
Authors
(none)
Tags
Stats
Related papers
- A Conversational Gesture Synthesis System Based On Emotions And Semantics (2025)0.00
- Emogene: Audio-driven Emotional 3D Talking-head Generation (2024)2.26
- Expgest: Expressive Speaker Generation Using Diffusion Model And Hybrid Audio-text Guidance (2024)4.52
- Audio Is All In One: Speech-driven Gesture Synthetics Using Wavlm Pre-trained Model (2023)0.00
- Probtalk3d: Non-deterministic Emotion Controllable Speech-driven 3D Facial Animation Synthesis Using VQ-VAE (2024)11.53
- Speech2affectivegestures: Synthesizing Co-speech Gestures With Generative Adversarial Affective Expression Learning (2021)14.35
- Diffusion-based Co-speech Gesture Generation Using Joint Text And Audio Representation (2023)10.07
- Diffsheg: A Diffusion-based Approach For Real-time Speech-driven Holistic 3D Expression And Gesture Generation (2024)0.00