Pmmtalk: Speech-driven 3D Facial Animation From Complementary Pseudo Multi-modal Features
2023 Β· Tianshun Han, Shengnan Gui, Yiqing Huang, et al.
Abstract
Speech-driven 3D facial animation has improved a lot recently while most related works only utilize acoustic modality and neglect the influence of visual and textual cues, leading to unsatisfactory results in terms of precision and coherence. We argue that visual and textual cues are not trivial information. Therefore, we present a novel framework, namely PMMTalk, using complementary Pseudo Multi-Modal features for improving the accuracy of facial animation. The framework entails three modules: PMMTalk encoder, cross-modal alignment module, and PMMTalk decoder. Specifically, the PMMTalk encoder employs the off-the-shelf talking head generation architecture and speech recognition technology to extract visual and textual information from speech, respectively. Subsequently, the cross-modal alignment module aligns the audio-image-text features at temporal and semantic levels. Then PMMTalk decoder is employed to predict lip-syncing facial blendshape coefficients. Contrary to prior methods,
Authors
(none)
Tags
Stats
Related papers
- Cstalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation (2024)3.58
- Df-3dface: One-to-many Speech Synchronized 3D Face Animation With Diffusion (2023)0.00
- Probabilistic Speech-driven 3D Facial Motion Synthesis: New Benchmarks, Methods, And Applications (2023)9.23
- Said: Speech-driven Blendshape Facial Animation With Diffusion (2023)0.00
- Probtalk3d: Non-deterministic Emotion Controllable Speech-driven 3D Facial Animation Synthesis Using VQ-VAE (2024)11.53
- See The Speaker: Crafting High-resolution Talking Faces From Speech With Prior Guidance And Region Refinement (2025)0.00
- Speech-driven Facial Animation Using Polynomial Fusion Of Features (2019)6.34
- Lpips-attnwav2lip: Generic Audio-driven Lip Synchronization For Talking Head Generation In The Wild (2026)12.65