Awesome 3D & NeRF Generation
3D & NeRF Generation is one of the most active areas in Awesome Generative Models β 676 papers in this collection, evaluated on datasets like COCO, ImageNet, FFHQ. A strong starting point is "InterleaveThinker: Reinforcing Agentic Interleaved Generation".
Datasets & benchmarks
Key papers
- InterleaveThinker: Reinforcing Agentic Interleaved Generation (2026)Dian Zheng et al.14.38
- Avatar V: Scaling Video-Reference Avatar Video Generation (2026)Benjamin Liang et al.7.85
- Bolt3D: Generating 3D Scenes in Seconds (2025)Stanislaw Szymanowicz and Jason Y. Zhang and Pratul Srinivasan and Ruiqi Gao and Arthur Brussee and Aleksander Holynski and Ricardo Martin-Brualla and Jonathan T. Barron and Philipp Henzler6.92
- Super-Resolution of 3D Micro-CT Images Using Generative Adversarial
Networks: Enhancing Resolution and Segmentation Accuracy (2025)Evgeny Ugolkov et al.6.58
- Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation (2023)David Junhao Zhang et al.6.55
- Latte: Latent Diffusion Transformer for Video Generation (2024)Xin Ma et al.6.11
- Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation (2024)Hongxu Jiang et al.5.40
- Projected Coupled Diffusion for Test-Time Constrained Joint Generation (2025)Hao Luan et al.5.03
- Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video
Generation Control (2025)Zekai Gu et al.4.76
- CUPID: Generative 3D Reconstruction via Joint Object and Pose Modeling (2025)Binbin Huang et al.4.75
- Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (2024)Sihyun Yu et al.4.60
- MCCD: Multi-Agent Collaboration-based Compositional Diffusion for
Complex Text-to-Image Generation (2025)Mingcheng Li et al.4.47
- Smoothing Dark Areas in Molecular Latent Diffusion (2026)Xi Wang et al.4.39
- Prompt2Effect: Training-Free Image-to-Video Model Specialization via LoRA Generation (2026)Xiaomeng Yang et al.4.39
- VideoWeave: Unlocking Geometric Consistency in Video Generation via Joint Geometry-Video Modeling (2026)Xunzhi Xiang et al.4.39
- MUSE: Agentic 3D Scene Authoring via Memory-Grounded Incremental Requirement Satisfaction (2026)Ruijie Xu et al.4.39
- VeriGeo: Controllable Geometry Question Generation with Numerical and Analytical Verification (2026)Xiaoxian Duan et al.4.39
- LapidaryEngine: Fully Conversational Crystal Generation (2026)Yusei Ito et al.4.39
- Pano3D: Unified 3D Reconstruction and Panoptic Segmentation (2026)Victor Barberteguy et al.4.39
- CausalMotion: Structured Physical Reasoning as Keyframe and Trajectory Guidance for Training-Free Video Generation (2026)Sihan Zhuang et al.4.39
- Memento: Reconstruct to Remember for Consistent Long Video Generation (2026)Xuan Wei et al.4.39
- Flood and Harvest: The Provable Necessity of Trivia for Generating Valuable Mathematics via the Lens of Language Generation in the Limit (2026)Xiaoyu Li et al.4.39
- Instruct-Particulate: Scaling Feed-Forward 3D Object Articulation with Kinematic Control (2026)Ruining Li et al.4.39
- Latent Process Generator Matching (2026)Lukas Billera et al.4.33
- DiffusionRenderer: Neural Inverse and Forward Rendering with Video
Diffusion Models (2025)Ruofan Liang and Zan Gojcic and Huan Ling and Jacob Munkberg and Jon Hasselgren and Zhi-Hao Lin and Jun Gao and Alexander Keller and Nandita Vijaykumar and Sanja Fidler and Zian Wang4.25
- DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation (2025)Yunhan Yang et al.3.92
- Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation (2025)Anshuk Uppal et al.3.92
- Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic
Music Generation (2025)Jincheng Zhang et al.3.81
- Memory-Efficient 3D High-Resolution Medical Image Synthesis Using
CRF-Guided GANs (2025)Mahshid Shiri et al.3.70
- Language-Guided Trajectory Traversal in Disentangled Stable Diffusion
Latent Space for Factorized Medical Image Generation (2025)Zahra TehraniNasab et al.3.70
- Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation (2025)Dawei Dai et al.3.59
- Structural Energy Guidance for View-Consistent Text-to-3D Generation (2026)Qing Zhang et al.3.45
- Paris 2.0: A Decentralized Diffusion Model for Video Generation (2026)Ali Rouzbayani et al.3.45
- DiffusionBlend: Learning 3D Image Prior through Position-aware Diffusion Score Blending for 3D Computed Tomography Reconstruction (2024)Bowen Song et al.3.20
- TetSphere Splatting: Representing High-Quality Geometry with Lagrangian
Volumetric Meshes (2024)Minghao Guo et al.3.14
- Diffusion Models are Efficient Data Generators for Human Mesh Recovery (2024)Yongtao Ge et al.3.03
- ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models (2025)Ozgur Kara et al.2.87
- Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving (2025)Yunshen Wang et al.2.87
- Wavelet-based Variational Autoencoders for High-Resolution Image
Generation (2025)Andrew Kiruluta2.82
- VideoHandles: Editing 3D Object Compositions in Videos Using Video
Generative Priors (2025)Juil Koo et al.2.76
- DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture
Design in Text to Image Generation (2025)Chen Chen et al.2.76
- Compressed Image Generation with Denoising Diffusion Codebook Models (2025)Guy Ohayon et al.2.71
- UniForm: A Unified Multi-Task Diffusion Transformer for Audio-Video Generation (2025)Lei Zhao et al.2.71
- HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation (2025)Qijun Gan et al.2.71
- Ultrasound Image Generation using Latent Diffusion Models (2025)Benoit Freiche et al.2.71
- DreamDrive: Generative 4D Scene Modeling from Street View Images (2025)Jiageng Mao et al.2.65
- CubeDiff: Repurposing Diffusion-Based Image Models for Panorama
Generation (2025)Nikolai Kalischek et al.2.65
- Blenderrag: High-fidelity 3D Object Generation Via Retrieval-augmented Code Synthesis (2026)Massimo Rondelli, Francesco Pivi, Maurizio Gabbrielli2.60
- ZoomLDM: Latent Diffusion Model for multi-scale image generation (2024)Srikar Yellapragada et al.2.54
- OctFusion: Octree-based Diffusion Models for 3D Shape Generation (2024)Bojun Xiong et al.2.37
- Tora: Trajectory-oriented Diffusion Transformer for Video Generation (2024)Zhenghao Zhang and Junchao Liao and Menghao Li and Zuozhuo Dai and Bingxue Qiu and Siyu Zhu and Long Qin and Weizhi Wang2.32
- StereoDiffusion: Training-Free Stereo Image Generation Using Latent
Diffusion Models (2024)Lezhong Wang et al.2.10
- Variational Learning for Insertion-based Generation (2026)Yangtian Zhang et al.2.00
- ZipSplat: Fewer Gaussians, Better Splats (2026)Alexander Veicht et al.2.00
- DeepJEB++: Foundation Model-Driven Large-Scale 3D Engineering Dataset via 2D Latent Space Augmentation (2026)Soyoung Yoo et al.2.00
- Colorful-noise: Training-free Low-frequency Noise Manipulation For Color-based Conditional Image Generation (2026)Nadav Z. Cohen, Ofir Abramovich, Ariel Shamir2.00
- A Graph Generation Pipeline For Critical Infrastructures Based On Heuristics, Images And Depth Data (2026)Mike Diessner, Yannick E. Tarant2.00
- What Drives Compositional Generalization? The Importance Of Continuous Training Objectives In Visual Generative Models (2026)Karim Farid, Rajat Sahay, Yumna Ali Alnaggar, et al.2.00
- Golden RPG: Confidence-adaptive Region-aware Noise For Compositional Text-to-image Generation (2026)Hao Li2.00
- VERTIGO: Visual Preference Optimization For Cinematic Camera Trajectory Generation (2026)Mengtian Li, Yuwei Lu, Feifei Li, et al.2.00