📊 Datasets — Awesome Cybersecurity

820 datasets & benchmarks — 13 canonical foundations plus emerging datasets mined from recent papers. Each links to the papers that use it.

CIFAR-10Emerging

CIFAR-10 is a dataset containing 60,000 32x32 color images across 10 different classes, commonly used to evaluate the performance of machine learning models, particularly in image classification tasks.

📄 59 papers⬇ 2.6k🤗 HF

CICIDS2017Canonical

We have developed a Python package as a wrapper around Hugging Face Hub and Hugging Face Datasets library to access this dataset easily. NIDS Datasets The nids-datasets package provides functionality to download and utilize specially curated and extracted datasets from the original UNSW-NB15 and CIC-IDS2017 datasets. These datasets, which initially were only flow datasets, have been enhanced to include packet-level information from the raw PCAP files. The dataset contains both… See the full description on the dataset page: https://huggingface.co/datasets/rdpahalavan/CIC-IDS2017.

📄 46 papers⬇ 1.8k💛 4🤗 HFapache-2.0

UNSW-NB15Canonical

Source https://www.kaggle.com/datasets/dhoogla/unswnb15?resource=download Dataset This is an academic intrusion detection dataset. All the credit goes to the original authors: dr. Nour Moustafa and dr. Jill Slay. Please cite their original paper and all other appropriate articles listed on the UNSW-NB15 page. The full dataset also offers the pcap, BRO and Argus files along with additional documentation. The modifications to the predesignated train-test sets are minimal… See the full description on the dataset page: https://huggingface.co/datasets/wwydmanski/UNSW-NB15.

📄 43 papers⬇ 339💛 2🤗 HF

NSL-KDDCanonical

NSL-KDD The data set is a data set that converts the arff File provided by the link into CSV and results. The data set is personally stored by converting data to float64. If you want to obtain additional original files, they are organized in the Original Directory in the repo. Labels The label of the data set is as follows. # Column Non-Null Count Dtype 0 duration 151165 non-null int64 1 protocol_type 151165 non-null object 2 service 151165 non-null… See the full description on the dataset page: https://huggingface.co/datasets/Mireu-Lab/NSL-KDD.

📄 36 papers⬇ 2.9k💛 6🤗 HFgpl-3.0

MNISTEmerging

Dataset Card for MNIST Dataset Summary The MNIST dataset consists of 70,000 28x28 black-and-white images of handwritten digits extracted from two NIST databases. There are 60,000 images in the training dataset and 10,000 images in the validation dataset, one class per digit so a total of 10 classes, with 7,000 images (6,000 train images and 1,000 test images) per class. Half of the image were drawn by Census Bureau employees and the other half by high school students… See the full description on the dataset page: https://huggingface.co/datasets/ylecun/mnist.

📄 27 papers⬇ 110.0k💛 261🤗 HFmit

ImageNetEmerging

Dataset Card for ImageNet-D This is a FiftyOne dataset with 4838 samples. Installation If you haven't already, install FiftyOne: pip install -U fiftyone Usage import fiftyone as fo import fiftyone.utils.huggingface as fouh # Load the dataset # Note: other available arguments include 'max_samples', etc dataset = fouh.load_from_hub("Voxel51/ImageNet-D") # Launch the App session = fo.launch_app(dataset) Dataset Description ImageNet-D is a new… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/ImageNet-D.

📄 18 papers⬇ 9.1k💛 2🤗 HF

CIFAR-100Emerging

CIFAR-100 is a dataset that contains 100 classes of images, each with 600 images, used to evaluate the performance of machine learning models, particularly in the context of image classification tasks.

📄 16 papers

CSE-CIC-IDS2018Canonical

The 'CSE-CIC-IDS2018' dataset contains network traffic data used to evaluate machine learning techniques for the identification and classification of various cyber attacks.

📄 15 papers⬇ 17🤗 HFecl-2.0

MITRE ATT&CKEmerging

MITRE ATT&CK is a comprehensive framework that contains a knowledge base of adversary tactics and techniques used in cyber attacks, and it is utilized to evaluate and enhance threat detection and response capabilities.

📄 13 papers

AdvBenchEmerging

Dataset Card for AdvBench Paper: Universal and Transferable Adversarial Attacks on Aligned Language Models Data: AdvBench Dataset About AdvBench is a set of 500 harmful behaviors formulated as instructions. These behaviors range over the same themes as the harmful strings setting, but the adversary’s goal is instead to find a single attack string that will cause the model to generate any response that attempts to comply with the instruction, and to do so over as many… See the full description on the dataset page: https://huggingface.co/datasets/walledai/AdvBench.

📄 12 papers⬇ 14.1k💛 108🤗 HFmit

CIC-dDoS-2019Emerging

The 'CIC-DDoS-2019' dataset is a benchmark that contains data on Distributed Denial of Service (DDoS) attacks, used to evaluate machine learning-based threat detection systems.

📄 10 papers

BODMASEmerging

The BODMAS dataset contains 134,435 samples and is used to evaluate the performance of machine learning classifiers and generative AI in cybersecurity threat detection.

📄 9 papers⬇ 7🤗 HF

HarmBenchEmerging

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Paper: HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Data: Dataset About In this dataset card, we only use the behavior prompts proposed in HarmBench. License MIT Citation If you find HarmBench useful in your research, please consider citing the paper: @article{mazeika2024harmbench, title={HarmBench: A… See the full description on the dataset page: https://huggingface.co/datasets/walledai/HarmBench.

📄 8 papers⬇ 8.2k💛 51🤗 HFmit

AgentDojoEmerging

AgentDojo is a benchmark that evaluates the effectiveness of monitoring protocols against indirect prompt injection attacks on AI agents.

📄 8 papers

Tiny ImageNetEmerging

Dataset Card for tiny-imagenet Dataset Summary Tiny ImageNet contains 100000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images, and 50 test images. Languages The class labels in the dataset are in English. Dataset Structure Data Instances { 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=64x64 at 0x1A800E8E190, 'label': 15 }… See the full description on the dataset page: https://huggingface.co/datasets/zh-plus/tiny-imagenet.

📄 7 papers⬇ 29.9k💛 102🤗 HF

Bot-IoTCanonical

The 'Bot-IoT' dataset contains network traffic data specifically related to Internet of Things (IoT) devices and is used to evaluate the effectiveness of intrusion detection systems against modern IoT-based attacks.

📄 7 papers⬇ 17🤗 HF

EMBERCanonical

Recipe and flavor pairings dataset(s) to be used for LLM training.

📄 7 papers

ToN-IoTCanonical

The 'ToN-IoT' dataset is a benchmark that contains network traffic data specifically for Internet of Things (IoT) devices and is used to evaluate cybersecurity threat detection and categorization methods.

📄 7 papers

GTSRBEmerging

Dataset Card for GTSRB Dataset Summary The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers from relevant fields to participate: The competition is designed to allow for participation without special domain knowledge. Our benchmark has the following properties: Single-image, multi-class classification problem More than 40… See the full description on the dataset page: https://huggingface.co/datasets/bazyl/GTSRB.

📄 6 papers⬇ 6.9k🤗 HFgpl-3.0

CICIoT-2023Emerging

The CICIoT-2023 dataset is a benchmark that contains data for evaluating IoT threat detection methods, specifically focusing on the performance of machine learning models in identifying various IoT attack scenarios.

📄 6 papers⬇ 1.9k🤗 HFcc

IoT-23Emerging

The 'IoT-23' dataset is a benchmark that contains a collection of IoT device traffic data used to evaluate the effectiveness of machine learning models in detecting cyber attacks, particularly zero-day vulnerabilities.

📄 6 papers

SVHNEmerging

Dataset Card for Street View House Numbers Dataset Summary SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST (e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem… See the full description on the dataset page: https://huggingface.co/datasets/ufldl-stanford/svhn.

📄 5 papers⬇ 39.9k💛 15🤗 HFother

MalimgEmerging

The MalImg dataset contains grayscale images of malware binaries and is used to evaluate the performance of malware detection frameworks in few-shot learning scenarios.

📄 5 papers⬇ 155🤗 HF

Android malware datasetsEmerging

The 'Android malware datasets' contain collections of labeled and unlabeled samples of Android applications used to evaluate the performance of malware detection methods, particularly in the context of machine learning approaches.

📄 5 papers

CIC-IoMT-2024Emerging

The 'CIC-IoMT-2024' dataset is a benchmark that contains multiple attack types, including DDoS, brute-force, and command-injection attacks, and is used to evaluate network security situation analysis and forecasting systems.

📄 5 papers

Edge-IIoTsetEmerging

The 'Edge-IIoTset' dataset is a benchmark used to evaluate the performance of intrusion detection systems in dynamic Industrial Internet of Things (IIoT) environments.

📄 5 papers

National Vulnerability DatabaseEmerging

The National Vulnerability Database is a repository of information on known cybersecurity vulnerabilities that is used to enhance threat intelligence and improve malware detection systems.

📄 5 papers

JailbreakV-28KEmerging

⛓‍💥 JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks 🌐 GitHub | 🛎 Project Page ｜ 👉 Download full datasets If you like our project, please give us a star ⭐ on Hugging Face for the latest update. 📰 News Date Event 2024/07/09 🎉 Our paper is accepted by COLM 2024. 2024/06/22 🛠️ We have updated our version to V0.2, which supports users to customize their attack models… See the full description on the dataset page: https://huggingface.co/datasets/JailbreakV-28K/JailBreakV-28k.

📄 4 papers⬇ 2.9k💛 68🤗 HFmit

JailbreakBenchEmerging

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models Paper: JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Data: JailbreaBench-HFLink About Jailbreakbench is an open-source robustness benchmark for jailbreaking large language models (LLMs). The goal of this benchmark is to comprehensively track progress toward (1) generating successful jailbreaks and (2) defending against these jailbreaks. To this end, we… See the full description on the dataset page: https://huggingface.co/datasets/walledai/JailbreakBench.

📄 4 papers⬇ 660💛 6🤗 HF

CICIDSEmerging

CIC-IDS This dataset is a dataset that sorts multiple tracks that are attacked by the network. The data on that dataset are as follows. 자료 The types of Attacks are as follows. DDoS Web_Attack_�_Brute_Force Infiltration DoS_GoldenEye DoS_Hulk Heartbleed Bot DoS_Slowhttptest Web_Attack_�_XSS DoS_slowloris FTP-Patator SSH-Patator Web_Attack_�_Sql_Injection PortScan The percentage of attack attempts is as follows. Detailed Attack Rate Chart A dataset made up of . In… See the full description on the dataset page: https://huggingface.co/datasets/Mireu-Lab/CIC-IDS.

📄 4 papers⬇ 145💛 1🤗 HF

CVEEmerging

The 'CVE' dataset contains descriptions of cybersecurity vulnerabilities, which are used to evaluate the ability of LLM agents to autonomously exploit these vulnerabilities.

📄 4 papers

CybenchEmerging

Cybench is a dataset that contains a collection of cybersecurity challenges used to evaluate the robustness and generalization of agentic large language models through semantics-preserving program transformations.

📄 4 papers

DARPA Transparent Computing (TC)Emerging

The DARPA Transparent Computing (TC) dataset/benchmark contains real-world traces used to evaluate the effectiveness of cybersecurity detection methods against Advanced Persistent Threats (APT) by addressing challenges such as class imbalance and feature dimensionality.

📄 4 papers

DrebinCanonical

Drebin is a dataset used for evaluating Android malware detection systems, containing various features extracted from Android applications to facilitate the analysis of malware behavior.

📄 4 papers

InjecAgentEmerging

InjecAgent is a benchmark containing 1,054 test cases that evaluates the vulnerability of tool-integrated large language model agents to indirect prompt injection attacks, focusing on two primary types of attack intentions: direct harm to users and exfiltration of private data.

📄 4 papers

VirusTotalEmerging

VirusTotal is a dataset that contains structured reports on malware, used to evaluate the effectiveness of Retrieval-Augmented Generation in enhancing the quality of malware explanations.

📄 4 papers

IMDBEmerging

Dataset Card for "imdb" Dataset Summary Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.

📄 3 papers⬇ 171.0k💛 394🤗 HFother

ImageNet-1kEmerging

Dataset Card for ImageNet Dataset Summary ILSVRC 2012, commonly known as 'ImageNet' is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). ImageNet aims to provide on average 1000 images to illustrate each synset. Images of each concept are… See the full description on the dataset page: https://huggingface.co/datasets/ILSVRC/imagenet-1k.

📄 3 papers⬇ 126.1k💛 872🤗 HFother

FEMNISTEmerging

Dataset Card for FEMNIST The FEMNIST dataset is a part of the LEAF benchmark. It represents image classification of handwritten digits, lower and uppercase letters, giving 62 unique labels. Dataset Details Dataset Description Each sample is comprised of a (28x28) grayscale image, writer_id, hsf_id, and character. Curated by: LEAF License: BSD 2-Clause License Dataset Sources The FEMNIST is a preprocessed (in a way that resembles preprocessing for… See the full description on the dataset page: https://huggingface.co/datasets/flwrlabs/femnist.

📄 3 papers⬇ 6.5k💛 8🤗 HFbsd-2-clause

MM-SafetyBenchEmerging

Warning: This dataset may contain sensitive or harmful content. Users are advised to handle it with care and ensure that their use complies with relevant ethical guidelines and legal requirements. Usage and License Notices: The dataset is intended and licensed for research use only. They are also restricted to uses that follow the license agreement GPT-4 and Stable Diffusion. The dataset is CC BY NC 4.0 (allowing only non-commercial use). Data Source: For more information about the dataset… See the full description on the dataset page: https://huggingface.co/datasets/PKU-Alignment/MM-SafetyBench.

📄 3 papers⬇ 2.9k💛 8🤗 HFcc-by-nc-4.0

NetFlowEmerging

NetFlow V3 NetFlow V3 datasets for machine learning-based network intrusion detection research. This repository republishes the NetFlow V3 datasets introduced in Temporal Analysis of NetFlow Datasets for Network Intrusion Detection Systems by Majed Luay, Siamak Layeghy, Seyedehfaezeh Hosseininoorbin, Mohanad Sarhan, Nour Moustafa, and Marius Portmann. arXiv Description NetFlow V3 extends earlier NetFlow datasets with temporal features for time-based analysis in NIDS… See the full description on the dataset page: https://huggingface.co/datasets/keys-i/netFlow.

📄 3 papers⬇ 266🤗 HFcc-by-nc-sa-4.0

Reddit datasetEmerging

A meta dataset of Reddit's own /r/datasets community.

📄 3 papers⬇ 38💛 4🤗 HFcc-by-4.0

AgentDynEmerging

AgentDyn is a manually designed benchmark containing 60 dynamic open-ended tasks used to evaluate the vulnerability of real-world AI agent security systems to prompt injection attacks.

📄 3 papers⬇ 21🤗 HF

InSDNEmerging

The InSDN dataset contains SDN traffic data, which is used to evaluate advanced threat detection methods in Software Defined Networking environments.

📄 3 papers⬇ 17💛 1🤗 HF

ImageNet-100Emerging

ImageNet-100 is a subset of the ImageNet dataset that contains 100 classes and is used to evaluate the performance of image classifiers, particularly in the context of adversarial robustness against gradient-based attacks.

📄 3 papers⬇ 15🤗 HFmit

CAPECEmerging

CAPEC (Common Attack Pattern Enumeration and Classification) is a dataset that contains a structured catalog of attack patterns used to evaluate and enhance the understanding of adversary behaviors in cybersecurity.

📄 3 papers

CICMalDroid-2020Emerging

The CICMalDroid-2020 dataset contains dynamically obtained Android malware behavior samples and is used to evaluate the performance of machine learning algorithms in detecting malicious code.

📄 3 papers

Common Vulnerabilities and Exposures (CVE)Emerging

Common Vulnerabilities and Exposures (CVE) is a publicly available database that contains standardized identifiers for known cybersecurity vulnerabilities, which is used to evaluate and enhance threat prediction and risk assessment in cybersecurity.

📄 3 papers

FaceForensics++Emerging

FaceForensics++ is a dataset used to evaluate the performance of deepfake image detection models under adversarial manipulations and cross-dataset conditions.

📄 3 papers

Hugging FaceEmerging

Hugging Face is a platform that provides pre-trained machine learning models in standard formats to facilitate accessibility and reuse, and it is used to evaluate the detection of malicious models in the context of machine learning supply chain security.

📄 3 papers

HumanEvalEmerging

The 'HumanEval' dataset is a benchmark that contains programming problems used to evaluate the performance of large reasoning models in generating code solutions.

📄 3 papers

ImageNet-CEmerging

ImageNet-C is a benchmark dataset that contains corrupted versions of images from the ImageNet dataset and is used to evaluate the robustness of models against various types of image distortions.

📄 3 papers

IoTID-20Emerging

The 'IoTID-20' dataset is a benchmark that contains various IoT network traffic data used to evaluate the performance of intrusion detection systems in identifying cyber threats in IoT environments.

📄 3 papers

KDD Cup 99Canonical

The KDD Cup 99 dataset is a benchmark dataset for network security that contains network traffic data used to evaluate the performance of intrusion detection systems.

📄 3 papers

LlamaEmerging

The 'Llama' dataset/benchmark is used to evaluate the effectiveness of hybrid approaches in exploiting vulnerabilities of Large Language Models (LLMs) through token-level and prompt-level jailbreak strategies.

📄 3 papers

MITRE ATLASEmerging

MITRE ATLAS is a dataset that contains a structured representation of adversarial tactics, techniques, and procedures specifically for evaluating the security of artificial intelligence systems.

📄 3 papers

NazarioEmerging

The 'Nazario' dataset is a benchmark used to evaluate phishing detection systems, containing a collection of phishing emails that helps assess the effectiveness of detection algorithms.

📄 3 papers

PhishTankEmerging

PhishTank is a dataset that contains links to reported phishing websites, used to evaluate the effectiveness of machine-learning models in detecting phishing attacks.

📄 3 papers

MATH-500Emerging

Dataset Card for MATH-500 This dataset contains a subset of 500 problems from the MATH benchmark that OpenAI created in their Let's Verify Step by Step paper. See their GitHub repo for the source file: https://github.com/openai/prm800k/tree/main?tab=readme-ov-file#math-splits

📄 2 papers⬇ 168.2k💛 318🤗 HF

Fashion-MNISTEmerging

Dataset Card for FashionMNIST Dataset Summary Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing… See the full description on the dataset page: https://huggingface.co/datasets/zalando-datasets/fashion_mnist.

📄 2 papers⬇ 36.2k💛 67🤗 HFmit

Loading datasets…