πŸ“Š Datasets β€” Awesome Computer Vision

2,147 datasets & benchmarks β€” 34 canonical foundations plus emerging datasets mined from recent papers. Each links to the papers that use it.

2147 of 2147 datasets
COCOCanonical
πŸ“„ 256 papers
ImageNetCanonical
πŸ“„ 185 papers⬇ 32πŸ€— HF
CityscapesCanonical

This dataset is part of the CycleGAN datasets, originally hosted here: https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/ Citation @article{DBLP:journals/corr/ZhuPIE17, author = {Jun{-}Yan Zhu and Taesung Park and Phillip Isola and Alexei A. Efros}, title = {Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks}, journal = {CoRR}, volume = {abs/1703.10593}, year… See the full description on the dataset page: https://huggingface.co/datasets/huggan/cityscapes.

πŸ“„ 84 papers⬇ 454πŸ’› 4πŸ€— HF
Pascal VOCCanonical

PASCAL_VOC

πŸ“„ 62 papers⬇ 200πŸ€— HF
ADE20KCanonical
πŸ“„ 51 papers
KITTICanonical

Dataset Card for Kitti The Kitti dataset. The Kitti object detection and object orientation estimation benchmark consists of 7481 training images and 7518 test images, comprising a total of 80.256 labeled objects

πŸ“„ 43 papers⬇ 1.3kπŸ’› 5πŸ€— HFunknown
CIFAR-10Canonical
πŸ“„ 31 papers⬇ 1.7kπŸ€— HF
YouTube-VOSEmerging
πŸ“„ 28 papers⬇ 21πŸ€— HF
LVISCanonical

Progress on object detection is enabled by datasets that focus the research community's attention on open challenges. This process led us from simple images to complex scenes and from bounding boxes to segmentation masks. In this work, we introduce LVIS (pronounced `el-vis'): a new dataset for Large Vocabulary Instance Segmentation. We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images. Due to the Zipfian distribution of categories in natural images, LVIS naturally has a long tail of categories with few training samples. Given that state-of-the-art deep learning methods for object detection perform poorly in the low-sample regime, we believe that our dataset poses an important and exciting new scientific challenge.

πŸ“„ 27 papers⬇ 249πŸ’› 2πŸ€— HFcc-by-4.0
HICO-DETEmerging

Dataset Card for HICO-DET Dataset Dataset Summary HICO-DET is a dataset for detecting human-object interactions (HOI) in images. It contains 47,776 images (38,118 in train set and 9,658 in test set), 600 HOI categories constructed by 80 object categories and 117 verb classes. HICO-DET provides more than 150k annotated human-object pairs. V-COCO provides 10,346 images (2,533 for training, 2,867 for validating and 4,946 for testing) and 16,199 person instances. Each person… See the full description on the dataset page: https://huggingface.co/datasets/zhimeng/hico_det.

πŸ“„ 25 papers⬇ 882πŸ’› 9πŸ€— HFmit
PASCAL VOC 2012Emerging
πŸ“„ 23 papers⬇ 900πŸ’› 2πŸ€— HF
CIFAR-100Canonical
πŸ“„ 23 papers
MOT17Emerging
πŸ“„ 21 papers
nuScenesCanonical
πŸ“„ 20 papers⬇ 116πŸ€— HFapache-2.0
DAVIS 2017Emerging
πŸ“„ 20 papers
DAVISCanonical
πŸ“„ 19 papers
YouTubeVOSEmerging
πŸ“„ 19 papers
KineticsCanonical
πŸ“„ 18 papers⬇ 139πŸ’› 1πŸ€— HFcc-by-4.0
MPIIEmerging
πŸ“„ 17 papers
V-COCOEmerging
πŸ“„ 17 papers
ScanNetCanonical
πŸ“„ 16 papers⬇ 1.7kπŸ’› 1πŸ€— HF
THUMOS'14Emerging
πŸ“„ 16 papers
NYU Depth V2Canonical

The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.

πŸ“„ 15 papers⬇ 52.8kπŸ’› 39πŸ€— HFapache-2.0
Human3.6MEmerging
πŸ“„ 14 papers⬇ 52πŸ€— HF
DAVIS2016Emerging
πŸ“„ 14 papers
DAVIS 2016Emerging
πŸ“„ 14 papers
Human 3.6MEmerging
πŸ“„ 14 papers
PASCAL VOC 2007Emerging
πŸ“„ 14 papers
CUB-200-2011Canonical
πŸ“„ 13 papers⬇ 212πŸ€— HFopenrail
Pascal ContextEmerging
πŸ“„ 13 papers⬇ 59πŸ€— HF
DTUEmerging
πŸ“„ 13 papers
BDD100KEmerging

Dataset Card for bdd100k-validation From one of the largest open source driving datasets, BDD100k, is the BDD100K images dataset. The dataset consists of every 10th second in the videos and contains a train, validation and test split. It contains labels for object detection, weather, time of day, and scene of the driving! This is a FiftyOne dataset with 10000 samples. Installation If you haven't already, install FiftyOne: pip install -U fiftyone Usage… See the full description on the dataset page: https://huggingface.co/datasets/dgural/bdd100k.

πŸ“„ 12 papers⬇ 3.5kπŸ’› 5πŸ€— HFbsd
WIDER FACECanonical

WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% data as training, validation and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset. Similar to MALF and Caltech datasets, we do not release bounding box ground truth for the test images. Users are required to submit final prediction files, which we shall proceed to evaluate.

πŸ“„ 12 papers⬇ 2.4kπŸ’› 51πŸ€— HFcc-by-nc-nd-4.0
DanceTrackEmerging

Dataset Card for DanceTrack DanceTrack is a multi-human tracking dataset with two emphasized properties, (1) uniform appearance: humans are in highly similar and almost undistinguished appearance, (2) diverse motion: humans are in complicated motion pattern and their relative positions exchange frequently. We expect the combination of uniform appearance and complicated motion pattern makes DanceTrack a platform to encourage more comprehensive and intelligent multi-object tracking… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/DanceTrack.

πŸ“„ 12 papers⬇ 2.1kπŸ’› 3πŸ€— HFcc-by-4.0
PASCAL-5iEmerging
πŸ“„ 12 papers
YouTube-VIS 2019Emerging
πŸ“„ 12 papers
UCF101Canonical
πŸ“„ 11 papers⬇ 194πŸ€— HF
ImageNet VIDEmerging

ImageNetVID dataset Usage Please follow the command to use: cat ILSVRC2015_VID.tar.gz.a* > ILSVRC2015_VID.tar.gz cat ILSVRC2017_DET.tar.gz.a* > ILSVRC2017_DET.tar.gz

πŸ“„ 11 papers⬇ 82πŸ’› 4πŸ€— HF
LinemodEmerging
πŸ“„ 11 papers⬇ 77πŸ€— HFunknown
COCO-StuffEmerging

COCO-Stuff augments all 164K images of the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning.

πŸ“„ 11 papers⬇ 51πŸ’› 1πŸ€— HFcc-by-4.0
YouTubeVIS-2019Emerging
πŸ“„ 11 papers
LIBEROEmerging

This dataset was created using LeRobot. Dataset Structure meta/info.json: { "codebase_version": "v3.0", "robot_type": "panda", "total_episodes": 1693, "total_frames": 273465, "total_tasks": 40, "chunks_size": 1000, "fps": 10.0, "splits": { "train": "0:1693" }, "data_path": "data/chunk-{chunk_index:03d}/file-{file_index:03d}.parquet", "video_path": "videos/{video_key}/chunk-{chunk_index:03d}/file-{file_index:03d}.mp4"… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceVLA/libero.

πŸ“„ 10 papers⬇ 23.2kπŸ’› 60πŸ€— HFapache-2.0
Something-SomethingCanonical

The Something-Something dataset (version 2) is a collection of 220,847 labeled video clips of humans performing pre-defined, basic actions with everyday objects. It is designed to train machine learning models in fine-grained understanding of human hand gestures like putting something into something, turning something upside down and covering something with something.

πŸ“„ 10 papers⬇ 242πŸ’› 21πŸ€— HFother
COCO test-devEmerging
πŸ“„ 10 papers
YouTube-VISEmerging
πŸ“„ 10 papers
MOT20Emerging

MOT20 MOT20 is a benchmark dataset for single-camera multi-object tracking (MOT) and pedestrian detection in very crowded real-world scenes. This Hugging Face repository provides MOT20 in the original MOTChallenge-style structure for research, benchmarking, training, and evaluation of multi-object tracking systems. MOT20 was introduced to stress-test MOT methods in high-density pedestrian scenes, including crowded squares, indoor train stations, stadium exits, and pedestrian… See the full description on the dataset page: https://huggingface.co/datasets/Lekim89/MOT20.

πŸ“„ 9 papers⬇ 5.3kπŸ€— HFcc-by-nc-sa-3.0
ActivityNetEmerging
πŸ“„ 9 papers⬇ 1.8kπŸ’› 6πŸ€— HF
SUN RGB-DCanonical
πŸ“„ 9 papers
YouTube-VIS-2021Emerging
πŸ“„ 9 papers
YouTubeVIS-2021Emerging
πŸ“„ 9 papers
RefCOCOEmerging

Dataset Card for "refcoco" More Information needed

πŸ“„ 8 papers⬇ 5.6kπŸ’› 11πŸ€— HF
LaSOTEmerging

Dataset Card for LaSOT Dataset Summary Large-scale Single Object Tracking (LaSOT) aims to provide a dedicated platform for training data-hungry deep trackers as well as assessing long-term tracking performance. This repository contains the conference version of LaSOT, published in CVPR-19 (LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking). LaSOT is featured in: Large-scale: 1,400 sequences with more than 3.5 millions frames High-quality: Manual… See the full description on the dataset page: https://huggingface.co/datasets/l-lt/LaSOT.

πŸ“„ 8 papers⬇ 2.2kπŸ’› 10πŸ€— HF
MOSEEmerging

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery of new molecules and materials. Such models learn on a large collection of molecular structures and produce novel compounds. In this work, we introduce Molecular Sets (MOSES), a benchmarking platform to support research on machine learning for drug discovery. MOSES implements several popular molecular generation models and provides a… See the full description on the dataset page: https://huggingface.co/datasets/katielink/moses.

πŸ“„ 8 papers⬇ 166πŸ’› 4πŸ€— HFmit
GOT-10kEmerging
πŸ“„ 8 papers⬇ 104πŸ’› 1πŸ€— HF
ICDAR2015Emerging
πŸ“„ 8 papers⬇ 72πŸ€— HF
EPIC-KITCHENEmerging
πŸ“„ 8 papers⬇ 67πŸ€— HF
EPIC-KITCHENSEmerging
πŸ“„ 8 papers⬇ 67πŸ€— HF
COCO-20iEmerging
πŸ“„ 8 papers
COCO-20^iEmerging
πŸ“„ 8 papers
DAVIS17Emerging
πŸ“„ 8 papers