Datasets

Unimodal and Cross-Modal Hashing Datasets

Unimodal Datasets: For unimodal experiments (query and database are in the same feature space e.g. images), there are six popular and freely available image datasets: LabelMe, CIFAR-10, NUS-WIDE, MNIST, SIFT1M and ImageNet. The datasets are of widely varying size (22,019-1.3 million images), are represented by an array of different feature descriptors (from GIST, SIFT, RGB pixels to bag of visual words) and cover a diverse range of different image topics from natural scenes to personal photos, logos and drawings.

Cross-modal Datasets: Cross-modal retrieval experiments (query and database can be in different feature spaces e.g. image and text) are typically conducted on the `Wiki' dataset, Microsoft COCO and NUSWIDE datasets. All datasets come with images and associated paired textual descriptors, a key requirement for training and evaluating a cross-modal retrieval model.

Name	Dataset	Modality	Size	Features
, .	CIFAR10	Image	60000	512 dimensional GIST
, .	MS-COCO	Image/Text	87783	RGB pixels (image) - 5 sentences per image (text)
, .	ImageNet	Image	1331167	4096 dimensional CNN
, .	LabelMe	Image	22019	512 dimensional GIST
, .	MIR-FLICKR25K	Image/Text	25000	RGB pixels (image) - 38 categories 1386 tags (text)
, .	MNIST	Image	70000	Grayscale Pixels
, .	NUSWIDE	Image/Text	269648	500 dimensional BoW (image) - 5018 dimensional tags (text)
, .	SIFT1M	Image	1000000	SIFT
, .	TINY100K	Image	100000	384 dimensional GIST
, .	WIKI	Image/Text	2669	128 dimensional SIFT (image) - 10 dimensional LDA topics (text)

Stay Updated

Unimodal and Cross-Modal Hashing Datasets