πŸ“Š Datasets β€” Awesome Generative Models

358 datasets & benchmarks β€” 16 canonical foundations plus emerging datasets mined from recent papers. Each links to the papers that use it.

358 of 358 datasets
CIFAR-100Emerging
πŸ“„ 5 papers
SWE-Bench-VerifiedEmerging

Dataset Summary SWE-bench Verified is a subset of 500 samples from the SWE-bench test set, which have been human-validated for quality. SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. See this post for more details on the human-validation process. The dataset collects 500 test Issue-Pull Request pairs from popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The original… See the full description on the dataset page: https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified.

πŸ“„ 3 papers⬇ 69.6kπŸ’› 95πŸ€— HF
ScienceQAEmerging

Dataset Card Creation Guide Dataset Summary Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering Supported Tasks and Leaderboards Multi-modal Multiple Choice Languages English Dataset Structure Data Instances Explore more samples here. {'image': Image, 'question': 'Which of these states is farthest north?', 'choices': ['West Virginia', 'Louisiana', 'Arizona', 'Oklahoma'], 'answer': 0… See the full description on the dataset page: https://huggingface.co/datasets/derek-thomas/ScienceQA.

πŸ“„ 3 papers⬇ 21.3kπŸ’› 234πŸ€— HFcc-by-sa-4.0
HumanEvalEmerging

HumanEval-X is a benchmark for the evaluation of the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks.

πŸ“„ 3 papers⬇ 1.8kπŸ’› 95πŸ€— HFapache-2.0
CIFAR-10Canonical
πŸ“„ 3 papers⬇ 1.7kπŸ€— HF
MATH500Emerging

https://github.com/openai/prm800k/blob/main/prm800k/math_splits/test.jsonl

πŸ“„ 3 papers⬇ 188πŸ’› 9πŸ€— HF
GSM8KEmerging

Dataset Card for GSM8K Dataset Summary GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning. These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ βˆ’ Γ—Γ·) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.

πŸ“„ 2 papers⬇ 895.3kπŸ’› 1.4kπŸ€— HFmit
MBPPEmerging

Dataset Card for Mostly Basic Python Problems (mbpp) Dataset Summary The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases. As described in the paper, a subset of the data has been hand-verified by us. Released here as part of… See the full description on the dataset page: https://huggingface.co/datasets/google-research-datasets/mbpp.

πŸ“„ 2 papers⬇ 183.8kπŸ’› 230πŸ€— HFcc-by-4.0
GAIAEmerging

GAIA dataset GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format. Data and leaderboard GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to… See the full description on the dataset page: https://huggingface.co/datasets/gaia-benchmark/GAIA.

πŸ“„ 2 papers⬇ 42.2kπŸ’› 692πŸ€— HF
KITTIEmerging

Dataset Card for Kitti The Kitti dataset. The Kitti object detection and object orientation estimation benchmark consists of 7481 training images and 7518 test images, comprising a total of 80.256 labeled objects

πŸ“„ 2 papers⬇ 1.3kπŸ’› 5πŸ€— HFunknown
ALFWorldEmerging
πŸ“„ 2 papers⬇ 19πŸ€— HF
Tiny-ImageNetEmerging
πŸ“„ 2 papers⬇ 9πŸ€— HF
LongMemEval_SEmerging
πŸ“„ 2 papers⬇ 5πŸ€— HF
Llama-3.2-1BEmerging
πŸ“„ 2 papers
Qwen2.5-0.5BEmerging
πŸ“„ 2 papers
Qwen3-0.6BEmerging
πŸ“„ 2 papers
Qwen3-4BEmerging
πŸ“„ 2 papers
WebShopEmerging
πŸ“„ 2 papers
T2I-CompBenchCanonical

Hub version of the T2I-CompBench dataset. All credits and licensing belong to the creators of the dataset. This version was obtained as described below. First, the ".txt" files were obtained from this directory. Code import requests import os # Set the necessary parameters owner = "Karine-Huang" repo = "T2I-CompBench" branch = "main" directory = "examples/dataset" local_directory = "." # GitHub API URL to get contents of the directoryurl =… See the full description on the dataset page: https://huggingface.co/datasets/NinaKarine/t2i-compbench.

πŸ“„ 1 paper⬇ 1.1kπŸ’› 6πŸ€— HFmit
100-site flux ladderEmerging
πŸ“„ 1 paper
236 cleaned casesEmerging
πŸ“„ 1 paper
2D airfoilEmerging
πŸ“„ 1 paper
3D carEmerging
πŸ“„ 1 paper
444 LiveCodeBenchEmerging
πŸ“„ 1 paper
50-task hotel expense benchmarkEmerging
πŸ“„ 1 paper
540-image benchmarkEmerging
πŸ“„ 1 paper
635 benchmarksEmerging
πŸ“„ 1 paper
8,100 force-closure graspsEmerging
πŸ“„ 1 paper
81 objectsEmerging
πŸ“„ 1 paper
88 eGeMAPSEmerging
πŸ“„ 1 paper
AACR Project GENIE Biopharma Collaborative datasetEmerging
πŸ“„ 1 paper
ACS CensusEmerging
πŸ“„ 1 paper
adversarial dataset of 103 clinical MCQsEmerging
πŸ“„ 1 paper
Affordance20QEmerging
πŸ“„ 1 paper
AIME 2024/2025Emerging
πŸ“„ 1 paper
AIME 2025Emerging
πŸ“„ 1 paper
AIMOEmerging
πŸ“„ 1 paper
Alpamayo R1Emerging
πŸ“„ 1 paper
Amara SpatialEmerging
πŸ“„ 1 paper
AMCEmerging
πŸ“„ 1 paper
Android WorldEmerging
πŸ“„ 1 paper
AObenchEmerging
πŸ“„ 1 paper
A-OKVQAEmerging
πŸ“„ 1 paper
AppWorldEmerging
πŸ“„ 1 paper
ASVspoof 2021Emerging
πŸ“„ 1 paper
AuctionNetEmerging
πŸ“„ 1 paper
AudioDEREmerging
πŸ“„ 1 paper
AudioProcessBenchEmerging
πŸ“„ 1 paper
AutoPET-IIIEmerging
πŸ“„ 1 paper
BAVEDEmerging
πŸ“„ 1 paper
BayesmarkEmerging
πŸ“„ 1 paper
BCI-IV-2aEmerging
πŸ“„ 1 paper
BsB AerialEmerging
πŸ“„ 1 paper
CAEmerging
πŸ“„ 1 paper
CageEmerging
πŸ“„ 1 paper
CAGE Challenge 4Emerging
πŸ“„ 1 paper
CASASEmerging
πŸ“„ 1 paper
Case Western Reserve University bearingsEmerging
πŸ“„ 1 paper
C. elegansEmerging
πŸ“„ 1 paper
ChemLexEmerging
πŸ“„ 1 paper