A Unified Compression Framework For Efficient Speech-driven Talking-face Generation
2023 Β· Bo-Kyeong Kim, Jaemin Kang, Daeun Seo, et al.
Abstract
Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limit their efficient deployment. This study aims to develop a lightweight model for speech-driven talking-face synthesis. We build a compact generator by removing the residual blocks and reducing the channel width from Wav2Lip, a popular talking-face generator. We also present a knowledge distillation scheme to stably yet effectively train the small-capacity generator without adversarial learning. We reduce the number of parameters and MACs by 28\(\times\) while retaining the performance of the original model. Moreover, to alleviate a severe performance drop when converting the whole generator to
Authors
(none)
Tags
Stats
Related papers
- See The Speaker: Crafting High-resolution Talking Faces From Speech With Prior Guidance And Region Refinement (2025)0.00
- Diffusiontalker: Efficient And Compact Speech-driven 3D Talking Head Via Personalizer-guided Distillation (2025)5.05
- Emogene: Audio-driven Emotional 3D Talking-head Generation (2024)2.26
- Large Generative Model-assisted Talking-face Semantic Communication System (2024)5.84
- From Inference To Generation: End-to-end Fully Self-supervised Generation Of Human Face From Speech (2020)0.00
- Lpips-attnwav2lip: Generic Audio-driven Lip Synchronization For Talking Head Generation In The Wild (2026)12.65
- Audio Input Generates Continuous Frames To Synthesize Facial Video Using Generative Adiversarial Networks (2022)0.00
- Text-driven Talking Face Synthesis By Reprogramming Audio-driven Models (2023)2.26