Awesome Text-to-Image
Text-to-Image is one of the most active areas in Awesome Generative Models β 948 papers in this collection, evaluated on datasets like COCO, ImageNet, CUB. A strong starting point is "InterleaveThinker: Reinforcing Agentic Interleaved Generation".
Datasets & benchmarks
Key papers
- InterleaveThinker: Reinforcing Agentic Interleaved Generation (2026)Dian Zheng et al.14.38
- Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis (2025)Bingxin Ke et al.10.47
- Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models (2025)Jinjin Zhang et al.10.03
- Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion
Transformers via In-Context Reflection (2025)Shufan Li et al.9.29
- RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space (2026)Xichen Pan et al.8.86
- Qwen-Image-Flash: Beyond Objective Design (2026)Tianhe Wu et al.8.85
- Learning Few-Step Diffusion Models by Trajectory Distribution Matching (2025)Yihong Luo et al.8.84
- DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models (2023)Ximing Xing et al.8.82
- One-step Diffusion Models with $f$-Divergence Distribution Matching (2025)Yilun Xu et al.8.70
- A Review on Generative AI For Text-To-Image and Image-To-Image
Generation and Implications To Scientific Images (2025)Zineb Sordo and Eric Chagnon and Daniela Ushizima7.64
- CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer (2024)Zhuoyi Yang et al.7.30
- Deeply Supervised Flow-Based Generative Models (2025)Inkyu Shin et al.7.24
- On the Challenges and Opportunities in Generative AI (2024)Laura Manduchi et al.6.89
- Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation (2023)David Junhao Zhang et al.6.55
- Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation (2024)Hongxu Jiang et al.5.40
- A Wavelet Diffusion GAN for Image Super-Resolution (2024)Lorenzo Aloisi and Luigi Sigillo and Aurelio Uncini and Danilo Comminiello5.37
- GenDR: Lighten Generative Detail Restoration (2025)Yan Wang et al.4.87
- Generating Multimodal Images with GAN: Integrating Text, Image, and
Style (2025)Chaoyi Tan et al.4.76
- SINE: SINgle Image Editing with Text-to-Image Diffusion Models (2022)Zhixing Zhang et al.4.48
- PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach (2025)Nitin Rai et al.4.47
- MCCD: Multi-Agent Collaboration-based Compositional Diffusion for
Complex Text-to-Image Generation (2025)Mingcheng Li et al.4.47
- Image-to-Image Translation with Diffusion Transformers and CLIP-Based Image Conditioning (2025)Qiang Zhu et al.4.47
- Nepotistically Trained Generative-AI Models Collapse (2023)Matyas Bohacek and Hany Farid4.40
- Compressing Image Style Training into a Single Model Forward (2026)Zhongjie Duan et al.4.39
- Mask, Sample, Revise: A Revisable CTMC Inference Stack for Guided Discrete Flow Matching Text-to-Speech (2026)Alef Iury Siqueira Ferreira et al.4.39
- GarmentSketch: Large-scale Sketch-to-Fashion Benchmark (2026)Duong-Duy-Khang Bui et al.4.39
- Toward 360-Degree Indoor Panorama Editing via Tuning-Free Diffusion Model with Refocusing Cross-Attention (2026)Dinh-Khoi Vo et al.4.39
- Rethinking One-Step Image Editing through ChordEdit: Reproduction, Simplification, and New Insights (2026)Minghan Li et al.4.39
- Context-aware Modality-Topology Co-Alignment for Multimodal Attributed Graphs (2026)Sirui Zhang et al.4.39
- ForceForget: Reinforcement Concept Removal for Enhancing Safety in Text-to-Image Models (2026)Dong Han et al.4.39
- HPSv3++: Scaling Reward Models Across the Full Spectrum of Diffusion Model Capabilities (2026)Yijun Liu et al.4.39
- Diffusion-Based Ukrainian Handwritten Text Generation with Cross-Domain Style Transfer (2026)Andrii Ahitoliev et al.4.33
- StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation
from Text (2024)Roberto Henschel et al.4.21
- CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects (2024)Zhao Wang et al.4.10
- ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion (2025)Sungho Koh et al.4.09
- Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models (2025)Fan Yang et al.3.97
- StorySync: Training-Free Subject Consistency in Text-to-Image Generation via Region Harmonization (2025)Gopalji Gaur et al.3.97
- TurboFill: Adapting Few-step Text-to-image Model for Fast Image
Inpainting (2025)Liangbin Xie et al.3.75
- DP-LDMs: Differentially Private Latent Diffusion Models (2023)Michael F. Liu et al.3.71
- Language-Guided Trajectory Traversal in Disentangled Stable Diffusion
Latent Space for Factorized Medical Image Generation (2025)Zahra TehraniNasab et al.3.70
- Tutorial on Diffusion Models for Imaging and Vision (2024)Stanley H. Chan3.69
- Diffusion Models Through a Global Lens: Are They Culturally Inclusive? (2025)Zahra Bayramli et al.3.64
- Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation (2025)Dawei Dai et al.3.59
- Structural Energy Guidance for View-Consistent Text-to-3D Generation (2026)Qing Zhang et al.3.45
- Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models (2024)Sooyeon Go and Kyungmook Choi and Minjung Shin and Youngjung Uh3.20
- Home-made Diffusion Model from Scratch to Hatch (2025)Shih-Ying Yeh3.17
- DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval (2025)Ruohong Yang et al.3.10
- Faster Diffusion via Temporal Attention Decomposition (2024)Haozhe Liu et al.3.09
- Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model (2024)Yuxuan Zhang et al.3.03
- PartComposer: Learning and Composing Part-Level Concepts from Single-Image Examples (2025)Junyu Liu et al.2.93
- ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models (2025)Ozgur Kara et al.2.87
- Stable Diffusion for Data Augmentation in COCO and Weed Datasets (2023)Boyang Deng2.86
- DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture
Design in Text to Image Generation (2025)Chen Chen et al.2.76
- Contrastive Learning Guided Latent Diffusion Model for Image-to-Image
Translation (2025)Qi Si et al.2.76
- Enhancing Creative Generation on Stable Diffusion-based Models (2025)Jiyeon Han et al.2.76
- A Diffusion Model Translator for Efficient Image-to-Image Translation (2025)Mengfei Xia and Yu Zhou and Ran Yi and Yong-Jin Liu and Wenping Wang2.71
- CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion
Models (2025)Xinle Cheng et al.2.71
- StyleBlend: Enhancing Style-Specific Content Creation in Text-to-Image
Diffusion Models (2025)Zichong Chen et al.2.71
- DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image
Diffusion Models (2025)Daewon Chae et al.2.71
- PQD: Post-training Quantization for Efficient Diffusion Models (2025)Jiaojiao Ye et al.2.65