Edityourself: Audio-driven Generation And Manipulation Of Talking Head Videos With Diffusion Transformers
2026 Β· John Flynn, Wolfgang Paier, Dimitar Dinev, et al.
Abstract
Current generative video models excel at producing novel content from text and image prompts, but leave a critical gap in editing existing pre-recorded videos, where minor alterations to the spoken script require preserving motion, temporal coherence, speaker identity, and accurate lip synchronization. We introduce EditYourself, a DiT-based framework for audio-driven video-to-video (V2V) editing that enables transcript-based modification of talking head videos, including the seamless addition, removal, and retiming of visually spoken content. Building on a general-purpose video diffusion model, EditYourself augments its V2V capabilities with audio conditioning and region-aware, edit-focused training extensions. This enables precise lip synchronization and temporally coherent restructuring of existing performances via spatiotemporal inpainting, including the synthesis of realistic human motion in newly added segments, while maintaining visual fidelity and identity consistency over long
Authors
(none)
Tags
Stats
Related papers
- Emotivetalk: Expressive Talking Head Generation Through Audio Information Decoupling And Emotional Video Diffusion (2024)0.00
- Aadiff: Audio-aligned Video Synthesis With Text-to-image Diffusion (2023)0.00
- 3mdit: Unified Tri-modal Diffusion Transformer For Text-driven Synchronized Audio-video Generation (2025)0.00
- REST: Diffusion-based Real-time End-to-end Streaming Talking Head Generation Via Id-context Caching And Asynchronous Streaming Distillation (2025)0.00
- Emogene: Audio-driven Emotional 3D Talking-head Generation (2024)2.26
- Syncdiff: Diffusion-based Talking Head Synthesis With Bottlenecked Temporal Visual Prior For Improved Synchronization (2025)4.52
- Voicedit: Dual-condition Diffusion Transformer For Environment-aware Speech Synthesis (2024)5.84
- Diffusiontalker: Efficient And Compact Speech-driven 3D Talking Head Via Personalizer-guided Distillation (2025)5.05