Awesome Image Generation
Image Generation is one of the most active areas in Awesome Computer Vision β 1,389 papers in this collection, evaluated on datasets like ImageNet, COCO, Cityscapes. A strong starting point is "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows".
Datasets & benchmarks
Key papers
- Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows (2021)Ze Liu, Yutong Lin, Yue Cao, et al.38.40
- Pointrend: Image Segmentation As Rendering (2019)Alexander Kirillov, Yuxin Wu, Kaiming He, et al.31.23
- Multi-stage Progressive Image Restoration (2021)Syed Waqas Zamir, Aditya Arora, Salman Khan, et al.30.61
- Meshed-memory Transformer For Image Captioning (2019)Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, et al.27.73
- Feedback Network For Image Super-resolution (2019)Zhen Li, Jinglei Yang, Zheng Liu, et al.26.83
- A Survey On Visual Transformer (2020)Kai Han, Yunhe Wang, Hanting Chen, et al.26.80
- Unified Vision-language Pre-training For Image Captioning And VQA (2019)Luowei Zhou, Hamid Palangi, Lei Zhang, et al.25.70
- Masked-attention Mask Transformer For Universal Image Segmentation (2021)Bowen Cheng, Ishan Misra, Alexander G. Schwing, et al.25.69
- Multimodal Unsupervised Image-to-image Translation (2018)Xun Huang, Ming-Yu Liu, Serge Belongie, et al.24.00
- Aggregated Contextual Transformations For High-resolution Image Inpainting (2021)Yanhong Zeng, Jianlong Fu, Hongyang Chao, et al.23.46
- Pixelnerf: Neural Radiance Fields From One Or Few Images (2020)Alex Yu, Vickie Ye, Matthew Tancik, et al.23.45
- Dual-level Collaborative Transformer For Image Captioning (2021)Yunpeng Luo, Jiayi Ji, Xiaoshuai Sun, et al.22.95
- Going Deeper With Image Transformers (2021)Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, et al.22.35
- Resolution-robust Large Mask Inpainting With Fourier Convolutions (2021)Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, et al.22.23
- Vision Transformer With Deformable Attention (2022)Zhuofan Xia, Xuran Pan, Shiji Song, et al.21.78
- Mvsnerf: Fast Generalizable Radiance Field Reconstruction From Multi-view Stereo (2021)Anpei Chen, Zexiang Xu, Fuqiang Zhao, et al.21.41
- Stableviton: Learning Semantic Correspondence With Latent Diffusion Model For Virtual Try-on (2023)Jeongho Kim, Gyojung Gu, Minho Park, et al.21.17
- UNISURF: Unifying Neural Implicit Surfaces And Radiance Fields For Multi-view Reconstruction (2021)Michael Oechsle, Songyou Peng, Andreas Geiger20.65
- Bilateral Reference For High-resolution Dichotomous Image Segmentation (2024)Peng Zheng, Dehong Gao, Deng-Ping Fan, et al.20.61
- Relaxed Transformer Decoders For Direct Action Proposal Generation (2021)Jing Tan, Jiaqi Tang, Limin Wang, et al.20.49
- Regionclip: Region-based Language-image Pretraining (2021)Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, et al.20.18
- Image Segmentation Using Text And Image Prompts (2021)Timo LΓΌddecke, Alexander S. Ecker19.96
- Rgb-infrared Cross-modality Person Re-identification Via Joint Pixel And Feature Alignment (2019)Guan'An Wang, Tianzhu Zhang, Jian Cheng, et al.19.89
- Pluralistic Image Completion (2019)Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai19.71
- Maskgit: Masked Generative Image Transformer (2022)Huiwen Chang, Han Zhang, Lu Jiang, et al.19.47
- Borrow From Anywhere: Pseudo Multi-modal Object Detection In Thermal Imagery (2019)Chaitanya Devaguptapu, Ninad Akolekar, Manuj M Sharma, et al.19.25
- Bidirectional Multi-scale Implicit Neural Representations For Image Deraining (2024)Xiang Chen, Jinshan Pan, Jiangxin Dong19.19
- What Do Single-view 3D Reconstruction Networks Learn? (2019)Maxim Tatarchenko, Stephan R. Richter, RenΓ© Ranftl, et al.19.13
- Semantically Multi-modal Image Synthesis (2020)Zhen Zhu, Zhiliang Xu, Ansheng You, et al.19.03
- Learning To Discover Multi-class Attentional Regions For Multi-label Image Recognition (2020)Bin-Bin Gao, Hong-Yu Zhou18.95
- LAVT: Language-aware Vision Transformer For Referring Image Segmentation (2021)Zhao Yang, Jiaqi Wang, Yansong Tang, et al.18.95
- Toward Characteristic-preserving Image-based Virtual Try-on Network (2018)Bochao Wang, Huabin Zheng, Xiaodan Liang, et al.18.81
- Contextual Residual Aggregation For Ultra High-resolution Image Inpainting (2020)Zili Yi, Qiang Tang, Shekoofeh Azizi, et al.18.71
- Mixed Transformer U-net For Medical Image Segmentation (2021)Hongyi Wang, Shiao Xie, Lanfen Lin, et al.18.70
- M2TR: Multi-modal Multi-scale Transformers For Deepfake Detection (2021)Junke Wang, Zuxuan Wu, Wenhao Ouyang, et al.18.46
- Single-view View Synthesis With Multiplane Images (2020)Richard Tucker, Noah Snavely18.12
- Cricavpr: Cross-image Correlation-aware Representation Learning For Visual Place Recognition (2024)Feng Lu, Xiangyuan Lan, Lijun Zhang, et al.18.09
- Cola-net: Collaborative Attention Network For Image Restoration (2021)Chong Mou, Jian Zhang, Xiaopeng Fan, et al.17.87
- Guided Curriculum Model Adaptation And Uncertainty-aware Evaluation For Semantic Nighttime Image Segmentation (2019)Christos Sakaridis, Dengxin Dai, Luc van Gool17.82
- Clip-reid: Exploiting Vision-language Model For Image Re-identification Without Concrete Text Labels (2022)Siyuan Li, Li Sun, Qingli Li17.74
- Uni-paint: A Unified Framework For Multimodal Image Inpainting With Pretrained Diffusion Model (2023)Shiyuan Yang, Xiaodong Chen, Jing Liao17.73
- Cross-domain Correspondence Learning For Exemplar-based Image Translation (2020)Pan Zhang, Bo Zhang, Dong Chen, et al.17.55
- Learning Spatial Attention For Face Super-resolution (2020)Chaofeng Chen, Dihong Gong, Hao Wang, et al.17.40
- Vitaev2: Vision Transformer Advanced By Exploring Inductive Bias For Image Recognition And Beyond (2022)Qiming Zhang, Yufei Xu, Jing Zhang, et al.17.31
- Efficientsam: Leveraged Masked Image Pretraining For Efficient Segment Anything (2023)Yunyang Xiong, Bala Varadarajan, Lemeng Wu, et al.17.18
- Dual Contrastive Learning For Unsupervised Image-to-image Translation (2021)Junlin Han, Mehrdad Shoeiby, Lars Petersson, et al.17.16
- Dark Model Adaptation: Semantic Image Segmentation From Daytime To Nighttime (2018)Dengxin Dai, Luc van Gool17.14
- Parser-free Virtual Try-on Via Distilling Appearance Flows (2021)Yuying Ge, Yibing Song, Ruimao Zhang, et al.17.12
- Exploring Smoothness And Class-separation For Semi-supervised Medical Image Segmentation (2022)Yicheng Wu, Zhonghua Wu, Qianyi Wu, et al.17.00
- Iterative Prompt Learning For Unsupervised Backlit Image Enhancement (2023)Zhexin Liang, Chongyi Li, Shangchen Zhou, et al.16.88
- Free View Synthesis (2020)Gernot Riegler, Vladlen Koltun16.80
- SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation (2024)Xinyu Xiong et al.16.76
- Cross Language Image Matching For Weakly Supervised Semantic Segmentation (2022)Jinheng Xie, Xianxu Hou, Kai Ye, et al.16.71
- Towards Multi-pose Guided Virtual Try-on Network (2019)Haoye Dong, Xiaodan Liang, Bochao Wang, et al.16.53
- Scaling Up Vision-language Pre-training For Image Captioning (2021)Xiaowei Hu, Zhe Gan, Jianfeng Wang, et al.16.49
- Adaptive Token Sampling For Efficient Vision Transformers (2021)Mohsen Fayyaz, Soroush Abbasi Koohpayegani, Farnoush Rezaei Jafari, et al.16.45
- Inpainting Transformer For Anomaly Detection (2021)Jonathan Pirnay, Keng Chai16.45
- Detecting The Unexpected Via Image Resynthesis (2019)Krzysztof Lis, Krishna Nakka, Pascal Fua, et al.16.39
- Coarse-to-fine Latent Diffusion For Pose-guided Person Image Synthesis (2024)Yanzuo Lu, Manlin Zhang, Andy J Ma, et al.16.34
- Extreme View Synthesis (2018)Inchang Choi, Orazio Gallo, Alejandro Troccoli, et al.16.30