📊 Datasets — Awesome Computer Vision

2,147 datasets & benchmarks — 34 canonical foundations plus emerging datasets mined from recent papers. Each links to the papers that use it.

This dataset is part of the CycleGAN datasets, originally hosted here: https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/ Citation @article{DBLP:journals/corr/ZhuPIE17, author = {Jun{-}Yan Zhu and Taesung Park and Phillip Isola and Alexei A. Efros}, title = {Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks}, journal = {CoRR}, volume = {abs/1703.10593}, year… See the full description on the dataset page: https://huggingface.co/datasets/huggan/cityscapes.

📄 84 papers⬇ 454💛 4🤗 HF

Pascal VOCCanonical

PASCAL_VOC

Dataset Card for Kitti The Kitti dataset. The Kitti object detection and object orientation estimation benchmark consists of 7481 training images and 7518 test images, comprising a total of 80.256 labeled objects

📄 43 papers⬇ 1.3k💛 5🤗 HFunknown

CIFAR-10Canonical

📄 31 papers⬇ 1.7k🤗 HF

YouTube-VOSEmerging

📄 28 papers⬇ 21🤗 HF

LVISCanonical

Progress on object detection is enabled by datasets that focus the research community's attention on open challenges. This process led us from simple images to complex scenes and from bounding boxes to segmentation masks. In this work, we introduce LVIS (pronounced `el-vis'): a new dataset for Large Vocabulary Instance Segmentation. We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images. Due to the Zipfian distribution of categories in natural images, LVIS naturally has a long tail of categories with few training samples. Given that state-of-the-art deep learning methods for object detection perform poorly in the low-sample regime, we believe that our dataset poses an important and exciting new scientific challenge.

📄 27 papers⬇ 249💛 2🤗 HFcc-by-4.0

HICO-DETEmerging

Dataset Card for HICO-DET Dataset Dataset Summary HICO-DET is a dataset for detecting human-object interactions (HOI) in images. It contains 47,776 images (38,118 in train set and 9,658 in test set), 600 HOI categories constructed by 80 object categories and 117 verb classes. HICO-DET provides more than 150k annotated human-object pairs. V-COCO provides 10,346 images (2,533 for training, 2,867 for validating and 4,946 for testing) and 16,199 person instances. Each person… See the full description on the dataset page: https://huggingface.co/datasets/zhimeng/hico_det.

📄 25 papers⬇ 882💛 9🤗 HFmit

PASCAL VOC 2012Emerging

📄 23 papers⬇ 900💛 2🤗 HF

📄 20 papers⬇ 116🤗 HFapache-2.0

📄 18 papers⬇ 139💛 1🤗 HFcc-by-4.0

📄 16 papers⬇ 1.7k💛 1🤗 HF

THUMOS'14Emerging

📄 16 papers

NYU Depth V2Canonical

The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.

📄 15 papers⬇ 52.8k💛 39🤗 HFapache-2.0

PASCAL VOC 2007Emerging

📄 14 papers

CUB-200-2011Canonical

📄 13 papers⬇ 212🤗 HFopenrail

Pascal ContextEmerging

Dataset Card for bdd100k-validation From one of the largest open source driving datasets, BDD100k, is the BDD100K images dataset. The dataset consists of every 10th second in the videos and contains a train, validation and test split. It contains labels for object detection, weather, time of day, and scene of the driving! This is a FiftyOne dataset with 10000 samples. Installation If you haven't already, install FiftyOne: pip install -U fiftyone Usage… See the full description on the dataset page: https://huggingface.co/datasets/dgural/bdd100k.

📄 12 papers⬇ 3.5k💛 5🤗 HFbsd

WIDER FACECanonical

WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% data as training, validation and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset. Similar to MALF and Caltech datasets, we do not release bounding box ground truth for the test images. Users are required to submit final prediction files, which we shall proceed to evaluate.

📄 12 papers⬇ 2.4k💛 51🤗 HFcc-by-nc-nd-4.0

DanceTrackEmerging

Dataset Card for DanceTrack DanceTrack is a multi-human tracking dataset with two emphasized properties, (1) uniform appearance: humans are in highly similar and almost undistinguished appearance, (2) diverse motion: humans are in complicated motion pattern and their relative positions exchange frequently. We expect the combination of uniform appearance and complicated motion pattern makes DanceTrack a platform to encourage more comprehensive and intelligent multi-object tracking… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/DanceTrack.

📄 12 papers⬇ 2.1k💛 3🤗 HFcc-by-4.0

PASCAL-5iEmerging

📄 12 papers

YouTube-VIS 2019Emerging

ImageNetVID dataset Usage Please follow the command to use: cat ILSVRC2015_VID.tar.gz.a* > ILSVRC2015_VID.tar.gz cat ILSVRC2017_DET.tar.gz.a* > ILSVRC2017_DET.tar.gz

📄 11 papers⬇ 82💛 4🤗 HF

LinemodEmerging

📄 11 papers⬇ 77🤗 HFunknown

COCO-StuffEmerging

COCO-Stuff augments all 164K images of the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning.

📄 11 papers⬇ 51💛 1🤗 HFcc-by-4.0

YouTubeVIS-2019Emerging

📄 11 papers

LIBEROEmerging

This dataset was created using LeRobot. Dataset Structure meta/info.json: { "codebase_version": "v3.0", "robot_type": "panda", "total_episodes": 1693, "total_frames": 273465, "total_tasks": 40, "chunks_size": 1000, "fps": 10.0, "splits": { "train": "0:1693" }, "data_path": "data/chunk-{chunk_index:03d}/file-{file_index:03d}.parquet", "video_path": "videos/{video_key}/chunk-{chunk_index:03d}/file-{file_index:03d}.mp4"… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceVLA/libero.

📄 10 papers⬇ 23.2k💛 60🤗 HFapache-2.0

Something-SomethingCanonical

The Something-Something dataset (version 2) is a collection of 220,847 labeled video clips of humans performing pre-defined, basic actions with everyday objects. It is designed to train machine learning models in fine-grained understanding of human hand gestures like putting something into something, turning something upside down and covering something with something.

📄 10 papers⬇ 242💛 21🤗 HFother

COCO test-devEmerging

MOT20 MOT20 is a benchmark dataset for single-camera multi-object tracking (MOT) and pedestrian detection in very crowded real-world scenes. This Hugging Face repository provides MOT20 in the original MOTChallenge-style structure for research, benchmarking, training, and evaluation of multi-object tracking systems. MOT20 was introduced to stress-test MOT methods in high-density pedestrian scenes, including crowded squares, indoor train stations, stadium exits, and pedestrian… See the full description on the dataset page: https://huggingface.co/datasets/Lekim89/MOT20.

📄 9 papers⬇ 5.3k🤗 HFcc-by-nc-sa-3.0

ActivityNetEmerging

📄 9 papers⬇ 1.8k💛 6🤗 HF

SUN RGB-DCanonical

📄 9 papers

YouTube-VIS-2021Emerging

📄 9 papers

YouTubeVIS-2021Emerging

📄 9 papers

RefCOCOEmerging

Dataset Card for "refcoco" More Information needed

📄 8 papers⬇ 5.6k💛 11🤗 HF

LaSOTEmerging

Dataset Card for LaSOT Dataset Summary Large-scale Single Object Tracking (LaSOT) aims to provide a dedicated platform for training data-hungry deep trackers as well as assessing long-term tracking performance. This repository contains the conference version of LaSOT, published in CVPR-19 (LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking). LaSOT is featured in: Large-scale: 1,400 sequences with more than 3.5 millions frames High-quality: Manual… See the full description on the dataset page: https://huggingface.co/datasets/l-lt/LaSOT.

📄 8 papers⬇ 2.2k💛 10🤗 HF

MOSEEmerging

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery of new molecules and materials. Such models learn on a large collection of molecular structures and produce novel compounds. In this work, we introduce Molecular Sets (MOSES), a benchmarking platform to support research on machine learning for drug discovery. MOSES implements several popular molecular generation models and provides a… See the full description on the dataset page: https://huggingface.co/datasets/katielink/moses.

📄 8 papers⬇ 166💛 4🤗 HFmit

GOT-10kEmerging

📄 8 papers⬇ 104💛 1🤗 HF

EPIC-KITCHENSEmerging