📊 Datasets — Awesome Generative Models

811 datasets & benchmarks — 16 canonical foundations plus emerging datasets mined from recent papers. Each links to the papers that use it.

CIFAR-10Canonical

60,000 32×32 color images in 10 classes — a small, standard image-classification benchmark.

📄 166 papers⬇ 2.6k🤗 HF

ImageNetCanonical

~1.28M labeled images across 1,000 categories (ILSVRC) — the standard large-scale image-classification benchmark.

📄 122 papers⬇ 9.1k💛 2🤗 HF

MNISTEmerging

Dataset Card for MNIST Dataset Summary The MNIST dataset consists of 70,000 28x28 black-and-white images of handwritten digits extracted from two NIST databases. There are 60,000 images in the training dataset and 10,000 images in the validation dataset, one class per digit so a total of 10 classes, with 7,000 images (6,000 train images and 1,000 test images) per class. Half of the image were drawn by Census Bureau employees and the other half by high school students… See the full description on the dataset page: https://huggingface.co/datasets/ylecun/mnist.

📄 94 papers⬇ 110.0k💛 261🤗 HFmit

CelebAEmerging

CelebA is a dataset containing celebrity images annotated with various attributes, used to evaluate the alignment of generative models and vision encoders in capturing meaningful semantic information.

📄 78 papers⬇ 564🤗 HF

COCOCanonical

Common Objects in Context — 330k images with object-detection, segmentation, keypoint, and captioning annotations.

📄 37 papers⬇ 28.1k💛 82🤗 HF

FFHQCanonical

Flickr-Faces-HQ — 70,000 high-quality 1024×1024 face images, widely used for generative modeling.

📄 34 papers⬇ 15.2k💛 17🤗 HFcc

CelebA-HQCanonical

Citation @article{DBLP:journals/corr/abs-1710-10196, author = {Tero Karras and Timo Aila and Samuli Laine and Jaakko Lehtinen}, title = {Progressive Growing of GANs for Improved Quality, Stability, and Variation}, journal = {CoRR}, volume = {abs/1710.10196}, year = {2017}, url = {http://arxiv.org/abs/1710.10196}, eprinttype = {arXiv}, eprint = {1710.10196}, timestamp = {Mon, 13 Aug 2018… See the full description on the dataset page: https://huggingface.co/datasets/Chris1/celebA-HQ.

📄 30 papers⬇ 1.4k💛 1🤗 HF

MNIST FashionEmerging

Dataset Card for FashionMNIST Dataset Summary Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing… See the full description on the dataset page: https://huggingface.co/datasets/zalando-datasets/fashion_mnist.

📄 27 papers⬇ 36.2k💛 67🤗 HFmit

ImageNet 256×256Emerging

📄 26 papers

CIFAR-100Emerging

CIFAR-100 is a dataset that contains 100 classes of images, each with 600 images, used to evaluate self-supervised representation learning methods.

📄 18 papers

ImageNet 64x-64Emerging

'ImageNet 64x-64' is a benchmark dataset that contains images from the ImageNet collection resized to 64x64 pixels, used to evaluate the performance of generative models in image generation tasks.

📄 15 papers⬇ 18🤗 HF

LSUN bedroomEmerging

Dataset Card for "lsun-bedrooms" This is a 20% sample of the bedrooms category in LSUN, uploaded as a dataset for convenience. The license for this compilation only is MIT. The data retains the same license as the original dataset. This is (roughly) the code that was used to upload this dataset: import os import shutil from miniai.imports import * from miniai.diffusion import * from datasets import load_dataset path_data = Path('data') path_data.mkdir(exist_ok=True) path =… See the full description on the dataset page: https://huggingface.co/datasets/pcuenq/lsun-bedrooms.

📄 14 papers⬇ 936💛 12🤗 HFmit

ImageNet 256Emerging

ImageNet-256 is a dataset used to evaluate image generation models, containing a subset of images from the larger ImageNet dataset, specifically resized to 256x256 pixels.

📄 14 papers

LSUN ChurchEmerging

The 'LSUN-Church' dataset contains images of church interiors and is used to evaluate the performance of diffusion models in high-quality image generation.

📄 14 papers

STL-10Emerging

STL-10 is a dataset that contains images used to evaluate semi-supervised learning methods, specifically in the context of image classification tasks.

📄 14 papers

SVHNEmerging

Dataset Card for Street View House Numbers Dataset Summary SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST (e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem… See the full description on the dataset page: https://huggingface.co/datasets/ufldl-stanford/svhn.

📄 11 papers⬇ 39.9k💛 15🤗 HFother

ImageNet 512×512Emerging

'ImageNet 512×512' is a benchmark dataset used to evaluate the quality of image synthesis models, specifically measuring their performance in generating high-resolution images.

📄 11 papers

LSUNCanonical

The 'LSUN' dataset contains a large collection of labeled images across various categories and is used to evaluate the performance of image generation models.

📄 11 papers

ImageNet-64Emerging

ImageNet-64 is a benchmark dataset containing 64x64 pixel images used to evaluate the performance of generative models, particularly in the context of Generative Adversarial Networks (GANs).

📄 10 papers⬇ 87🤗 HF

Stable DiffusionEmerging

'Stable Diffusion' is a widely-used text-to-image diffusion model that serves as a benchmark for evaluating the performance of generative models in image generation.

📄 10 papers⬇ 14💛 1🤗 HF

UCF101Emerging

UCF-101 is a dataset that contains a collection of 101 action categories used to evaluate video understanding and action recognition tasks.

📄 10 papers

ImageNet-1kEmerging

Dataset Card for ImageNet Dataset Summary ILSVRC 2012, commonly known as 'ImageNet' is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). ImageNet aims to provide on average 1000 images to illustrate each synset. Images of each concept are… See the full description on the dataset page: https://huggingface.co/datasets/ILSVRC/imagenet-1k.

📄 9 papers⬇ 126.1k💛 872🤗 HFother

Kinetics-600Emerging

Kinetics-600 is a dataset used to evaluate video understanding models, containing a diverse set of 600 human action categories.

📄 9 papers⬇ 128🤗 HF

DSpritesEmerging

'dSprites' is a dataset that contains 2D shapes varying in factors such as shape, scale, orientation, and position, and it is used to evaluate the performance of models in disentangled representation learning.

📄 9 papers

VITON-HDEmerging

Dataset Card for "viton_hd" More Information needed

📄 8 papers⬇ 628💛 11🤗 HF

OmniglotEmerging

Dataset Card for "omniglot" More Information needed

📄 8 papers⬇ 299💛 1🤗 HF

ShapeNetEmerging

ShapeNet is a dataset that contains a diverse collection of 3D shapes used to evaluate the performance of generative models in producing visually plausible and geometrically symmetric objects.

📄 8 papers

ObjaverseEmerging

Objaverse Objaverse is a Massive Dataset with 800K+ Annotated 3D Objects. More documentation is coming soon. In the meantime, please see our paper and website for additional details. License The use of the dataset as a whole is licensed under the ODC-By v1.0 license. Individual objects in Objaverse are all licensed as creative commons distributable objects, and may be under the following licenses: CC-BY 4.0 - 721K objects CC-BY-NC 4.0 - 25K objects CC-BY-NC-SA… See the full description on the dataset page: https://huggingface.co/datasets/allenai/objaverse.

📄 7 papers⬇ 280.8k💛 454🤗 HFodc-by

CIFAREmerging

CIFAR is a dataset used to evaluate the performance of machine learning models, particularly in the context of computer vision tasks.

📄 7 papers

CUBEmerging

The CUB dataset, or Caltech-UCSD Birds-200-2011, contains images and textual descriptions of 200 bird species and is used to evaluate text-to-image synthesis methods.

📄 7 papers

HumanML-3DEmerging

HumanML-3D is a dataset used to evaluate human motion synthesis by providing 3D motion data conditioned on textual input.

📄 7 papers

Tiny ImageNetEmerging

Dataset Card for tiny-imagenet Dataset Summary Tiny ImageNet contains 100000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images, and 50 test images. Languages The class labels in the dataset are in English. Dataset Structure Data Instances { 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=64x64 at 0x1A800E8E190, 'label': 15 }… See the full description on the dataset page: https://huggingface.co/datasets/zh-plus/tiny-imagenet.

📄 6 papers⬇ 29.9k💛 102🤗 HF

AFHQCanonical

The 'AFHQ' dataset contains images of animal faces and is used to evaluate the performance of generative models in producing high-quality image synthesis.

📄 6 papers⬇ 1.1k💛 3🤗 HF

DeepFashionEmerging

Dataset Card for "deepfashion" More Information needed

📄 6 papers⬇ 192💛 10🤗 HF

GenEvalCanonical

GenEval is a benchmark used to evaluate the performance of text-to-image synthesis models.

📄 6 papers

ImageNet-512Emerging

ImageNet 512 is a benchmark dataset used to evaluate image generation models, containing images resized to 512x512 pixels.

📄 6 papers

CityscapesEmerging

This dataset is part of the CycleGAN datasets, originally hosted here: https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/ Citation @article{DBLP:journals/corr/ZhuPIE17, author = {Jun{-}Yan Zhu and Taesung Park and Phillip Isola and Alexei A. Efros}, title = {Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks}, journal = {CoRR}, volume = {abs/1703.10593}, year… See the full description on the dataset page: https://huggingface.co/datasets/huggan/cityscapes.

📄 5 papers⬇ 404💛 4🤗 HF

AFHQv-2Emerging

AFHQv-2 is a dataset that contains high-quality images of animals, specifically dogs, cats, and wild animals, and is used to evaluate the performance of generative models in image synthesis tasks.

📄 5 papers

Market-1501Emerging

The Market-1501 dataset is a benchmark that contains images of pedestrians for evaluating person re-identification algorithms.

📄 5 papers

OpenWebTextEmerging

Dataset Card for "openwebtext" Dataset Summary An open-source replication of the WebText dataset from OpenAI, that was used to train GPT-2. This distribution was created by Aaron Gokaslan and Vanya Cohen of Brown University. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Dataset Structure Data Instances plain_text Size of downloaded dataset… See the full description on the dataset page: https://huggingface.co/datasets/Skylion007/openwebtext.

📄 4 papers⬇ 76.1k💛 526🤗 HFcc0-1.0

CLEVREmerging

CLEVR is a diagnostic dataset that tests a range of visual reasoning abilities. It contains minimal biases and has detailed annotations describing the kind of reasoning each question requires. We use this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.

📄 4 papers⬇ 425🤗 HF

BraTS 2020Emerging

The 'BraTS-2020' dataset contains high-resolution three-dimensional medical images used to evaluate tumor segmentation methods.

📄 4 papers⬇ 293🤗 HFapache-2.0

CelebA-HQ-256Emerging

The 'CelebA-HQ-256' dataset contains high-quality images of celebrity faces and is used to evaluate the sampling quality and diversity of generative models.

📄 4 papers⬇ 143💛 3🤗 HF

COCO-30KEmerging

COCO-30K is a dataset that contains images and their corresponding annotations, used to evaluate the performance of image generation models, particularly in the context of diffusion models.

📄 4 papers⬇ 93💛 1🤗 HF

Oxford 102 FlowersEmerging

The 'Oxford-102 Flowers' dataset contains 102 categories of flower species and is used to evaluate the performance of image generation models, particularly in producing high-resolution and photo-realistic images.

📄 4 papers⬇ 50💛 1🤗 HF

ImageNet-128Emerging

ImageNet-128 is a benchmark dataset containing a subset of images from the larger ImageNet dataset, used to evaluate methods for solving linear inverse problems without the need for training.

📄 4 papers⬇ 23🤗 HFapache-2.0

CTEmerging

The 'CT' dataset/benchmark contains medical images used to evaluate the performance of segmentation models, specifically in terms of Dice scores and calibration metrics.

📄 4 papers

CUB-200Emerging

The CUB-200 dataset is a benchmark that contains images of 200 bird species and is used to evaluate the performance of image generation models in producing realistic bird images.

📄 4 papers

Dress CodeEmerging

The 'Dress Code' dataset is used to evaluate the fidelity of generative models in reconstructing standardized garment images from single photos of clothed individuals.

📄 4 papers

MRIEmerging

The 'MRI' dataset/benchmark contains noisy medical image data used to evaluate the performance of image-to-image translation methods, specifically in translating MRI images to PET images.

📄 4 papers

Oxford-102Emerging

The 'Oxford-102' dataset contains 102 categories of flowers and is used to evaluate the performance of image synthesis models in generating realistic images from textual descriptions.

📄 4 papers

QM9Emerging

QM-9 is a benchmark dataset that contains molecular structures and their associated properties, used to evaluate the performance of machine learning models in generating chemically valid molecules.

📄 4 papers

slakh-2100Emerging

Slakh-2100 is a multi-track dataset used to evaluate music generation, source imputation, and source separation in a joint latent diffusion framework.

📄 4 papers

Oxford FlowersEmerging

Dataset Card for "oxford-flowers" More Information needed

📄 3 papers⬇ 14.8k💛 20🤗 HFunknown

WikiArtEmerging

Dataset Summary Dataset containing 81,444 pieces of visual art from various artists, taken from WikiArt.org, along with class labels for each image : "artist" : 129 artist classes, including a "Unknown Artist" class "genre" : 11 genre classes, including a "Unknown Genre" class "style" : 27 style classes On WikiArt.org, the description for the "Artworks by Genre" page reads : A genre system divides artworks according to depicted themes and objects. A classical hierarchy of genres… See the full description on the dataset page: https://huggingface.co/datasets/huggan/wikiart.

📄 3 papers⬇ 8.1k💛 225🤗 HFunknown

KITTIEmerging

Dataset Card for Kitti The Kitti dataset. The Kitti object detection and object orientation estimation benchmark consists of 7481 training images and 7518 test images, comprising a total of 80.256 labeled objects

📄 3 papers⬇ 2.6k💛 5🤗 HFunknown

T2I-CompBenchCanonical

Hub version of the T2I-CompBench dataset. All credits and licensing belong to the creators of the dataset. This version was obtained as described below. First, the ".txt" files were obtained from this directory. Code import requests import os # Set the necessary parameters owner = "Karine-Huang" repo = "T2I-CompBench" branch = "main" directory = "examples/dataset" local_directory = "." # GitHub API URL to get contents of the directoryurl =… See the full description on the dataset page: https://huggingface.co/datasets/NinaKarine/t2i-compbench.

📄 3 papers⬇ 369💛 6🤗 HFmit

AudioCapsEmerging

audiocaps HuggingFace mirror of official data repo.

📄 3 papers⬇ 239💛 14🤗 HFmit

COCO-StuffEmerging

COCO-Stuff augments all 164K images of the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning.

📄 3 papers⬇ 63💛 1🤗 HFcc-by-4.0

CheXpertEmerging

CheXpert is a dataset containing a large collection of chest radiographs used to evaluate the performance of models in generating and analyzing medical images, particularly for conditions like cardiomegaly.

📄 3 papers⬇ 30🤗 HFother

Loading datasets…