Cstalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation
2024 Β· Xiangyu Liang, Wenlin Zhuang, Tianyong Wang, et al.
Abstract
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations. The main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions. Although lip alignment has seen many related studies, existing methods struggle to synthesize natural and realistic expressions, resulting in a mechanical and stiff appearance of facial animations. Even with some research extracting emotional features from speech, the randomness of facial movements limits the effective expression of emotions. To address this issue, this paper proposes a method called CSTalk (Correlation Supervised) that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions that conform to human facial motion patterns. To generate more intricate animations, we employ a rich set of control parameters based on the metahuman character model and
Authors
(none)
Tags
Stats
Related papers
- ESARM: 3D Emotional Speech-to-animation Via Reward Model From Automatically-ranked Demonstrations (2024)0.00
- Probtalk3d: Non-deterministic Emotion Controllable Speech-driven 3D Facial Animation Synthesis Using VQ-VAE (2024)11.53
- Probabilistic Speech-driven 3D Facial Motion Synthesis: New Benchmarks, Methods, And Applications (2023)9.23
- Pmmtalk: Speech-driven 3D Facial Animation From Complementary Pseudo Multi-modal Features (2023)3.58
- Controllable Expressive 3D Facial Animation Via Diffusion In A Unified Multimodal Space (2025)0.00
- Emogene: Audio-driven Emotional 3D Talking-head Generation (2024)2.26
- Emotiongesture: Audio-driven Diverse Emotional Co-speech 3D Gesture Generation (2023)10.97
- Df-3dface: One-to-many Speech Synchronized 3D Face Animation With Diffusion (2023)0.00