📊 Datasets — Awesome Federated Learning

1,065 datasets & benchmarks — 17 canonical foundations plus emerging datasets mined from recent papers. Each links to the papers that use it.

CIFAR-10Canonical

60,000 32×32 color images in 10 classes — a small, standard image-classification benchmark.

📄 401 papers

MNISTCanonical

70,000 28×28 grayscale images of handwritten digits (0–9) — the classic image-classification benchmark.

📄 291 papers

CIFAR-100Canonical

Like CIFAR-10 but with 100 fine classes (grouped into 20 superclasses), 600 images each.

📄 168 papers

Fashion-MNISTCanonical

A drop-in MNIST replacement with 70,000 grayscale images across 10 clothing categories.

📄 93 papers

FEMNISTCanonical

Dataset Card for FEMNIST The FEMNIST dataset is a part of the LEAF benchmark. It represents image classification of handwritten digits, lower and uppercase letters, giving 62 unique labels. Dataset Details Dataset Description Each sample is comprised of a (28x28) grayscale image, writer_id, hsf_id, and character. Curated by: LEAF License: BSD 2-Clause License Dataset Sources The FEMNIST is a preprocessed (in a way that resembles preprocessing for… See the full description on the dataset page: https://huggingface.co/datasets/flwrlabs/femnist.

📄 59 papers

EMNISTCanonical

EMNIST is a dataset that contains handwritten character samples and is used to evaluate the performance of machine learning models in recognizing and classifying these characters.

📄 26 papers

Tiny ImageNetEmerging

'Tiny ImageNet' is a dataset that contains 200 classes of images, each with 500 training images, used to evaluate performance in image classification tasks.

📄 26 papers

F-MNISTEmerging

F-MNIST (Fashion-MNIST) is a dataset that contains grayscale images of clothing items and is used to evaluate the performance of machine learning models, particularly in the context of backdoor attacks in federated learning.

📄 23 papers

SVHNEmerging

Dataset Card for Street View House Numbers Dataset Summary SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST (e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem… See the full description on the dataset page: https://huggingface.co/datasets/ufldl-stanford/svhn.

📄 23 papers

ImageNetEmerging

ImageNet is a large-scale dataset containing millions of labeled images across thousands of categories, commonly used to evaluate the performance of image classification algorithms.

📄 20 papers

CelebACanonical

200k celebrity face images annotated with 40 binary attributes and facial landmarks.

📄 18 papers

ShakespeareCanonical

The 'Shakespeare' dataset is a benchmarking dataset used to evaluate the performance of federated learning algorithms in terms of accuracy, convergence time, communication overhead, energy consumption, and robustness to non-IID data.

📄 15 papers

ToN-IoTEmerging

The 'ToN-IoT' dataset is a benchmark that contains network traffic data specifically designed for evaluating intrusion detection systems in Internet of Things (IoT) environments.

📄 13 papers

UNSW-NB15Emerging

The 'UNSW-NB-15' dataset is a benchmark that contains network traffic data used to evaluate intrusion detection systems, particularly in the context of identifying various types of cyber attacks.

📄 13 papers

CINIC-10Emerging

CINIC-10 is a dataset that contains images for evaluating machine learning models, specifically designed to benchmark performance in image classification tasks.

📄 11 papers

PACSEmerging

Dataset Card for PACS PACS is an image dataset for domain generalization. It consists of four domains, namely Photo (1,670 images), Art Painting (2,048 images), Cartoon (2,344 images), and Sketch (3,929 images). Each domain contains seven categories (labels): Dog, Elephant, Giraffe, Guitar, Horse, and Person. The total number of sample is 9991. Dataset Details PACS DG dataset is created by intersecting the classes found in Caltech256 (Photo), Sketchy (Photo, Sketch)… See the full description on the dataset page: https://huggingface.co/datasets/flwrlabs/pacs.

📄 11 papers

DomainNetEmerging

Data downloaded from WILDS (Download, paper, project). This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. We believe this constitutes a fair use of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed… See the full description on the dataset page: https://huggingface.co/datasets/wltjr1007/DomainNet.

📄 10 papers

GLUEEmerging

Dataset Card for GLUE Dataset Summary GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems. Supported Tasks and Leaderboards The leaderboard for the GLUE benchmark can be found at this address. It comprises the following tasks: ax A manually-curated evaluation dataset for fine-grained analysis of system… See the full description on the dataset page: https://huggingface.co/datasets/nyu-mll/glue.

MIMIC-IV is a publicly available critical care database that contains de-identified health data from ICU patients, used to evaluate predictive models for early sepsis detection.

📄 9 papers

RedditCanonical

The 'Reddit' dataset is used to evaluate the effectiveness of personalized federated learning approaches by providing a collection of user-generated content for assessing model performance in a collaborative learning environment.

📄 9 papers

Chest X-rayEmerging

The 'chest X-ray' dataset contains 112,120 chest X-ray images used to evaluate the diagnosis of various diseases in a medical imaging context.

📄 8 papers

LEAFCanonical

LEAF is a benchmark dataset used to evaluate federated learning algorithms, containing various tasks designed to simulate the challenges of training machine learning models across distributed networks of mobile devices.

📄 8 papers

MovieLensEmerging

MovieLens is a dataset used to evaluate recommendation systems, containing user preferences for movies.

📄 8 papers

AG NewsEmerging

AG News is a benchmark dataset that contains news articles categorized into four classes, used to evaluate text classification models in the context of federated learning.

📄 7 papers

CIC-IDS 2017Emerging

Raw network data was collected over a period of 5 days, Monday through Friday, and stored in PCAP files. Monday was used to create most of the Benign data, while the Attack-Network implemented various types of attacks over the next 4 days, such as Brute Force connections (FTP and SSH), several types of DoS attacks, as well as a Botnet attack, Infiltration attacks and subsequent Port-Scanning activity. The PCAP data was processed using a tool developed by one of the authors of [1], called… See the full description on the dataset page: https://huggingface.co/datasets/bvk/CICIDS-2017.

📄 7 papers

Stack OverflowCanonical

Stack Overflow is a dataset used to evaluate the performance of machine learning models, particularly in the context of personalized federated learning, by providing a platform for analyzing user-generated content and interactions.

📄 7 papers

CARLAEmerging

CARLA is a benchmark and simulator used to evaluate autonomous vehicles' interactions with human drivers in diverse geographic areas, focusing on trajectory forecasting and human-robot interactions.

📄 6 papers

CIFAREmerging

CIFAR is a benchmark dataset that contains a collection of images used to evaluate the performance of machine learning models, particularly in the context of image classification tasks.

📄 6 papers

Edge-IIoTsetEmerging

The 'Edge-IIoTset' is a dataset used to evaluate intrusion detection performance in heterogeneous Internet of Things (IoT) networks.

Dataset Card for "imdb" Dataset Summary Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.

📄 6 papers

N BaIoTEmerging

The N-BaIoT dataset is a benchmark for evaluating intrusion detection systems in heterogeneous Internet of Things (IoT) networks, containing diverse real-world scenarios related to IoT security.

The UCI-HAR dataset is a benchmark that contains human activity recognition data collected from smartphones, used to evaluate machine learning models in the context of federated learning.

📄 6 papers

ADNIEmerging

The Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset contains clinical, imaging, genetic, and biospecimen data aimed at evaluating the progression of Alzheimer's disease and related disorders.