A Comprehensive Review And Taxonomy Of Audio-visual Synchronization Techniques For Realistic Speech Animation
2024 Β· Jose Geraldo Fernandes, Sinval Nascimento, Daniel Dominguete, et al.
Abstract
In many applications, synchronizing audio with visuals is crucial, such as in creating graphic animations for films or games, translating movie audio into different languages, and developing metaverse applications. This review explores various methodologies for achieving realistic facial animations from audio inputs, highlighting generative and adaptive models. Addressing challenges like model training costs, dataset availability, and silent moment distributions in audio data, it presents innovative solutions to enhance performance and realism. The research also introduces a new taxonomy to categorize audio-visual synchronization methods based on logistical aspects, advancing the capabilities of virtual assistants, gaming, and interactive digital media.
Authors
(none)
Tags
Stats
Related papers
- Audio-visual Speech Codecs: Rethinking Audio-visual Speech Enhancement By Re-synthesis (2022)15.58
- Visualtts: TTS With Accurate Lip-speech Synchronization For Automatic Voice Over (2021)9.41
- Audio-sync Video Generation With Multi-stream Temporal Control (2025)0.00
- Syncvsr: Data-efficient Visual Speech Recognition With End-to-end Crossmodal Audio Token Synchronization (2024)8.35
- Lpips-attnwav2lip: Generic Audio-driven Lip Synchronization For Talking Head Generation In The Wild (2026)12.65
- Improving Lip-synchrony In Direct Audio-visual Speech-to-speech Translation (2024)0.00
- Syncdiff: Diffusion-based Talking Head Synthesis With Bottlenecked Temporal Visual Prior For Improved Synchronization (2025)4.52
- Dreamfoley: Scalable Vlms For High-fidelity Video-to-audio Generation (2025)0.00