FADA: Fast Diffusion Avatar Synthesis With Mixed-supervised Multi-cfg Distillation
2024 Β· Tianyun Zhong, Chao Liang, Jianwen Jiang, et al.
Abstract
Diffusion-based audio-driven talking avatar methods have recently gained attention for their high-fidelity, vivid, and expressive results. However, their slow inference speed limits practical applications. Despite the development of various distillation techniques for diffusion models, we found that naive diffusion distillation methods do not yield satisfactory results. Distilled models exhibit reduced robustness with open-set input images and a decreased correlation between audio and video compared to teacher models, undermining the advantages of diffusion models. To address this, we propose FADA (Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation). We first designed a mixed-supervised loss to leverage data of varying quality and enhance the overall model capability as well as robustness. Additionally, we propose a multi-CFG distillation with learnable tokens to utilize the correlation between audio and reference image conditions, reducing the threefold infer
Authors
(none)
Tags
Stats
Related papers
- Diffusiontalker: Efficient And Compact Speech-driven 3D Talking Head Via Personalizer-guided Distillation (2025)5.05
- Soulx-flashtalk: Real-time Infinite Streaming Of Audio-driven Avatars Via Self-correcting Bidirectional Distillation (2025)0.00
- Facediffuser: Speech-driven 3D Facial Animation Synthesis Using Diffusion (2023)13.79
- Diffspeaker: Speech-driven 3D Facial Animation With Diffusion Transformer (2024)5.24
- Fastvoicegrad: One-step Diffusion-based Voice Conversion With Adversarial Conditional Diffusion Distillation (2024)4.52
- High-fidelity Speech Synthesis With Minimal Supervision: All Using Diffusion Models (2023)5.24
- REST: Diffusion-based Real-time End-to-end Streaming Talking Head Generation Via Id-context Caching And Asynchronous Streaming Distillation (2025)0.00
- Diff-foley: Synchronized Video-to-audio Synthesis With Latent Diffusion Models (2023)0.00