Style-preserving Lip Sync Via Audio-aware Style Reference
2024 Β· Weizhi Zhong, Jichang Li, Yinqi Cai, et al.
Abstract
Audio-driven lip sync has recently drawn significant attention due to its widespread application in the multimedia domain. Individuals exhibit distinct lip shapes when speaking the same utterance, attributed to the unique speaking styles of individuals, posing a notable challenge for audio-driven lip sync. Earlier methods for such task often bypassed the modeling of personalized speaking styles, resulting in sub-optimal lip sync conforming to the general styles. Recent lip sync techniques attempt to guide the lip sync for arbitrary audio by aggregating information from a style reference video, yet they can not preserve the speaking styles well due to their inaccuracy in style aggregation. This work proposes an innovative audio-aware style reference scheme that effectively leverages the relationships between input audio and reference audio from style reference video to address the style-preserving audio-driven lip sync. Specifically, we first develop an advanced Transformer-based model
Authors
(none)
Tags
Stats
Related papers
- Self-supervised Context-aware Style Representation For Expressive Speech Synthesis (2022)6.34
- Lpips-attnwav2lip: Generic Audio-driven Lip Synchronization For Talking Head Generation In The Wild (2026)12.65
- Fluentlip: A Phonemes-based Two-stage Approach For Audio-driven Lip Synthesis With Optical Flow Consistency (2025)0.00
- Improving Lip-synchrony In Direct Audio-visual Speech-to-speech Translation (2024)0.00
- Fine-grained Style Control In Transformer-based Text-to-speech Synthesis (2021)11.19
- Using Multiple Reference Audios And Style Embedding Constraints For Speech Synthesis (2021)5.24
- Stylebook: Content-dependent Speaking Style Modeling For Any-to-any Voice Conversion Using Only Speech Data (2023)0.00
- Said: Speech-driven Blendshape Facial Animation With Diffusion (2023)0.00