Awesome Generative Models

📄Papers 🧭Topics 🔥Trending 🗺️Map 🏆Leaderboards 🎓Learn 🤖Ask AI

⋯More

👥Authors 📚Reading Packs 📊Datasets 🛠️Tools 📰News 📝Blogs ✉️Newsletter 🎯Research Radar 🔖Saved

← all topics overview

Audio Generation

loading…

Stay Updated

E-Mail Digest 🎯 Research Radar

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.

Awesome Audio Generation — curated papers, datasets & benchmarks · Awesome Generative Models

← all topics overview

Awesome Audio Generation

Audio Generation is one of the most active areas in Awesome Generative Models — 1,010 papers in this collection, evaluated on datasets like ImageNet, COCO, CIFAR-10. A strong starting point is "FilmBench: A Film-Grade Benchmark for Cinematic Video Generation".

Datasets & benchmarks

ImageNet15 papers · 🤗

COCO13 papers · 🤗

CIFAR-108 papers · 🤗

MNIST6 papers · 🤗

CelebA-HQ6 papers · 🤗

CelebA6 papers · 🤗

Objaverse4 papers · 🤗

FFHQ4 papers · 🤗

Market-15014 papers

ImageNet 256×2564 papers

slakh-21004 papers

Key papers

60 papers · trending (default)numbers = 🔥 heat

FilmBench: A Film-Grade Benchmark for Cinematic Video Generation (2026)
Shengyi Wang et al.
8.24
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models (2025)
Jinho Jeong et al.
7.77
A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images (2025)
Zineb Sordo and Eric Chagnon and Daniela Ushizima
7.58
IDEAgent: Agentic Quality-Diversity Search for Research Idea Generation (2026)
Varun Gumma et al.
7.37
Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation (2025)
Tiange Xiang et al.
5.59
Gemma 4 Technical Report (2026)
Gemma Team et al.
5.49
PE-Field 4D: Video Generation Models as Canvas (2026)
Yunpeng Bai et al.
5.49
SCALE: Self-Supervised Constraint-Aware Layout GEneration for Local P&R DRV Fixing at Advanced Nodes (2026)
Chia-Tung Ho et al.
5.01
TextSLIP: Text Self-Supervised CLIP for Medical Report Generation (2026)
Haoyu Jiang et al.
5.01
InnoText: A Unified Model for Visual Text Generation and Editing (2026)
Haowei Liu et al.
5.01
AgentHOI: Multi-Agent Reasoning for Human-Object-Interaction Video Generation via Implicit Representation Alignment (2026)
Ziyao Huang et al.
5.01
Projected Coupled Diffusion for Test-Time Constrained Joint Generation (2025)
Hao Luan et al.
4.97
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation (2025)
Mingcheng Li et al.
4.42
Healthier LLMs: Retrieval-Augmented Generation for Public Health Question Answering (2026)
Felix Feldman et al.
4.39
SpiS-GAN: Spiral-Modulated Handwriting Synthesis with Star Operation (2026)
Nguyen Duy Hieu et al.
4.39
An Hybrid Quantum-Classical Diffusion Model for Image Generation (2026)
Qipeng Qian et al.
4.39
ReGen: Hierarchical Multi-Prompt Representation Generation for Efficient Waveform Diffusion Models (2026)
Sang-Hoon Lee et al.
4.39
Autoregressive latent diffusion for 3D molecule generation (2026)
Federico Ottomano et al.
4.39
Reflecting Process Expertise in Procedural Material Generation (2026)
Kunal Gupta et al.
4.39
S1-Omni: A Unified Multimodal Reasoning Model for Scientific Understanding, Prediction, and Generation (2026)
Jiahao Zhao et al.
4.39
Music-JEPA: Learning a World Model of Sound from Action (2026)
Ziyu Wang et al.
4.39
Phylogenetic signal in marine mammal and bird vocalizations captured by audio foundation models: the limited benefit of domain-specific pretraining (2026)
V\'ictor Rinc\'on Yepes
4.39
MineValiCoder: Reliable Code Generation with Test Case Quality Mining and Bipartite Graph-Based Mutual Validation (2026)
Zhen Zhao et al.
4.39
Beyond "What to Retrieve": Uncertainty in Retrieval-Augmented Code Generation (2026)
Chandan Kumar Sah et al.
4.39
Catastrophic Compositional Generation: Why Vanilla Diffusion Models Fail to Extrapolate (2026)
Duncan Soiffer et al.
4.33
SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing (2025)
Seokhyeon Hong et al.
4.30
Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models (2026)
Egor Lifar et al.
4.27
Towards Controllable Image Generation through Representation-Conditioned Diffusion Models (2026)
Nithesh Chandher Karthikeyan et al.
4.27
Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models (2025)
Fan Yang et al.
3.92
StorySync: Training-Free Subject Consistency in Text-to-Image Generation via Region Harmonization (2025)
Gopalji Gaur et al.
3.92
Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction (2026)
Jiaxin Huang et al.
3.92
One-step Latent-free Image Generation with Pixel Mean Flows (2026)
Yiyang Lu et al.
3.92
DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation (2025)
Yunhan Yang et al.
3.86
Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation (2025)
Anshuk Uppal et al.
3.86
Improving the Generation of VAEs with High Dimensional Latent Spaces by the use of Hyperspherical Coordinates (2025)
Alejandro Ascarate et al.
3.86
Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation (2025)
Jincheng Zhang et al.
3.75
STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation (2025)
Ruyu Wang et al.
3.64
Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation (2025)
Zahra TehraniNasab et al.
3.64
Gen4U: Unifying Video Generation and Understanding via Diffusion (2026)
Michael King et al.
3.51
Nexus: Native Mesh Generation with Diffusion (2026)
Hanxiao Wang et al.
3.51
GuidedRAG: Semantic Steering of Retrieval-Augmented Generation (2026)
Matthijs Jansen op de Haar et al.
3.51
Improving Item Discoverability in e-Commerce Search via Related Intent Generation (2026)
Ji Xin et al.
3.51
Structural Energy Guidance for View-Consistent Text-to-3D Generation (2026)
Qing Zhang et al.
3.39
Paris 2.0: A Decentralized Diffusion Model for Video Generation (2026)
Ali Rouzbayani et al.
3.39
Generation of non-stationary stochastic fields using Generative Adversarial Networks (2022)
Alhasan Abdellatif et al.
3.19
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation (2024)
Changgu Chen et al.
2.92
TerraFusion: Joint Generation of Terrain Geometry and Texture Using Latent Diffusion Models (2025)
Kazuki Higo et al.
2.82
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models (2025)
Ozgur Kara et al.
2.82
Deep Generative Model-Based Generation of Synthetic Individual-Specific Brain MRI Segmentations (2025)
Ruijie Wang et al.
2.76
DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation (2025)
Xiaoliang Ju et al.
2.71
Compressed Image Generation with Denoising Diffusion Codebook Models (2025)
Guy Ohayon et al.
2.65
A Mixture-Based Framework for Guiding Diffusion Models (2025)
Yazid Janati et al.
2.65
Ultrasound Image Generation using Latent Diffusion Models (2025)
Benoit Freiche et al.
2.65
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens (2024)
Jaehyeon Kim et al.
2.60
3D MedDiffusion: A 3D Medical Latent Diffusion Model for Controllable and High-quality Medical Image Generation (2024)
Haoshen Wang et al.
2.60
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation (2025)
Nikolai Kalischek et al.
2.60
Multi-Source Music Generation with Latent Diffusion (2024)
Zhongweiyang Xu et al.
2.43
OctFusion: Octree-based Diffusion Models for 3D Shape Generation (2024)
Bojun Xiong et al.
2.37
StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models (2024)
Lezhong Wang et al.
2.10
LumaGuide: Distribution Shaping for Training-Free HDR Generation in Diffusion Models (2026)
Bowen Chen et al.
2.00