Lpips-attnwav2lip: Generic Audio-driven Lip Synchronization For Talking Head Generation In The Wild
2026 Β· Zhipeng Chen, Xinheng Wang, Lun Xie, et al.
Abstract
Researchers have shown a growing interest in Audio-driven Talking Head Generation. The primary challenge in talking head generation is achieving audio-visual coherence between the lips and the audio, known as lip synchronization. This paper proposes a generic method, LPIPS-AttnWav2Lip, for reconstructing face images of any speaker based on audio. We used the U-Net architecture based on residual CBAM to better encode and fuse audio and visual modal information. Additionally, the semantic alignment module extends the receptive field of the generator network to obtain the spatial and channel information of the visual features efficiently; and match statistical information of visual features with audio latent vector to achieve the adjustment and injection of the audio content information to the visual information. To achieve exact lip synchronization and to generate realistic high-quality images, our approach adopts LPIPS Loss, which simulates human judgment of image quality and reduces in
Authors
(none)
Tags
Stats
Related papers
- Fluentlip: A Phonemes-based Two-stage Approach For Audio-driven Lip Synthesis With Optical Flow Consistency (2025)0.00
- Audio2face: Generating Speech/face Animation From Single Audio With Attention-based Bidirectional LSTM Networks (2019)12.10
- Visualtts: TTS With Accurate Lip-speech Synchronization For Automatic Voice Over (2021)9.41
- Syncdiff: Diffusion-based Talking Head Synthesis With Bottlenecked Temporal Visual Prior For Improved Synchronization (2025)4.52
- Let There Be Sound: Reconstructing High Quality Speech From Silent Videos (2023)6.34
- See The Speaker: Crafting High-resolution Talking Faces From Speech With Prior Guidance And Region Refinement (2025)0.00
- A Unified Compression Framework For Efficient Speech-driven Talking-face Generation (2023)0.00
- Lipvoicer: Generating Speech From Silent Videos Guided By Lip Reading (2023)3.89