Awesome Papers

Papers

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation (2026)
Shuhong Zheng et al.
7.39
Puppet Dubbing (2021)
Ohad Fried et al.
—
Scene-Aware Audio Rendering via Deep Acoustic Analysis (2021)
Zhenyu Tang et al.
—
Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation (2021)
Taras Kucherenko et al.
—
Learning Acoustic Scattering Fields for Dynamic Interactive Sound Propagation (2021)
Zhenyu Tang et al.
—
Sound Synthesis, Propagation, and Rendering: A Survey (2021)
Shiguang Liu et al.
—
Generating coherent spontaneous speech and gesture from text (2021)
Simon Alexanderson et al.
—
Transflower: probabilistic autoregressive dance generation with multimodal attention (2022)
Guillermo Valle-P\'erez et al.
—
NeuralSound: Learning-based Modal Sound Synthesis With Acoustic Transfer (2022)
Xutong Jin et al.
—
Music2Video: Automatic Generation of Music Video with fusion of audio and text (2022)
Yoonjeon Kim et al.
—
A Psychoacoustic Quality Criterion for Path-Traced Sound Propagation (2022)
Chunxiao Cao et al.
—
On the role of Lip Articulation in Visual Speech Perception (2022)
Zakaria Aldeneh et al.
—
A Novel Speech-Driven Lip-Sync Model with CNN and LSTM (2022)
Xiaohong Li et al.
—
MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes (2022)
Anton Ratnarajah et al.
—
Audio-guided Album Cover Art Generation with Genetic Algorithms (2022)
James Marien et al.
—
Audio Input Generates Continuous Frames to Synthesize Facial Video Using Generative Adiversarial Networks (2022)
Hanhaodi Zhang
—
Pure Data and INScore: Animated notation for new music (2022)
Patricio F. Calatayud
—
The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation (2022)
Youngwoo Yoon et al.
—
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech (2022)
Saeed Ghorbani et al.
—
Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings (2023)
Tenglong Ao et al.
—
HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields (2023)
You Zhang et al.
—
Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models (2023)
Simon Alexanderson et al.
—
EDGE: Editable Dance Generation From Music (2022)
Jonathan Tseng et al.
—
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors (2022)
Zhentao Yu et al.
—
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers (2022)
Yasheng Sun et al.
—
A Comprehensive Review of Data-Driven Co-Speech Gesture Generation (2023)
Simbarashe Nyatsanga et al.
—
DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model (2023)
Fan Zhang et al.
—
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis (2023)
Susan Liang et al.
—
MusicFace: Music-driven Expressive Singing Face Synthesis (2023)
Pengfei Liu et al.
—
A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation (2023)
Bo-Kyeong Kim et al.
—
AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis (2023)
Hendric Vo{\ss} and Stefan Kopp
—
RealImpact: A Dataset of Impact Sound Fields for Real Objects (2023)
Samuel Clarke et al.
—
A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation (2024)
Louis Airale (LIG et al.
—
DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation (2023)
Qiaosong Qi et al.
—
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model (2024)
Fan Zhang et al.
—
The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings (2023)
Taras Kucherenko et al.
—
MAGMA: Music Aligned Generative Motion Autodecoder (2023)
Sohan Anisetty et al.
—
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion (2023)
Yujin Jeong et al.
—
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation (2023)
Yuan Gan et al.
—
FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion (2023)
Stefan Stan and Kazi Injamamul Haque and Zerrin Yumak
—
Music- and Lyrics-driven Dance Synthesis (2023)
Wenjie Yin et al.
—
BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer (2023)
Kunkun Pang et al.
—
Towards Streaming Speech-to-Avatar Synthesis (2023)
Tejas S. Prabhune et al.
—
Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control (2023)
Elif Bozkurt
—
3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing (2023)
Balamurugan Thambiraja et al.
—
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models (2024)
Shivangi Aneja et al.
—
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation (2024)
Junming Chen et al.
—
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives (2024)
Ronghui Li et al.
—
Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference (2024)
Fan Zhang et al.
—
MIDGET: Music Conditioned 3D Dance Generation (2024)
Jinwu Wang et al.
—