Datasets | Awesome Similarity Search Papers

Unimodal and Cross-Modal Hashing Datasets

Unimodal Datasets: For unimodal experiments (query and database are in the same feature space e.g. images), there are six popular and freely available image datasets: LabelMe, CIFAR-10, NUS-WIDE, MNIST, SIFT1M and ImageNet. The datasets are of widely varying size (22,019-1.3 million images), are represented by an array of different feature descriptors (from GIST, SIFT, RGB pixels to bag of visual words) and cover a diverse range of different image topics from natural scenes to personal photos, logos and drawings.

Cross-modal Datasets: Cross-modal retrieval experiments (query and database can be in different feature spaces e.g. image and text) are typically conducted on the `Wiki' dataset, Microsoft COCO and NUSWIDE datasets. All datasets come with images and associated paired textual descriptors, a key requirement for training and evaluating a cross-modal retrieval model.
NameDatasetModalitySizeFeatures
, . CIFAR10 Image 60000 512 dimensional GIST
, . MS-COCO Image/Text 87783 RGB pixels (image) - 5 sentences per image (text)
, . ImageNet Image 1331167 4096 dimensional CNN
, . LabelMe Image 22019 512 dimensional GIST
, . MIR-FLICKR25K Image/Text 25000 RGB pixels (image) - 38 categories 1386 tags (text)
, . MNIST Image 70000 Grayscale Pixels
, . NUSWIDE Image/Text 269648 500 dimensional BoW (image) - 5018 dimensional tags (text)
, . SIFT1M Image 1000000 SIFT
, . TINY100K Image 100000 384 dimensional GIST
, . WIKI Image/Text 2669 128 dimensional SIFT (image) - 10 dimensional LDA topics (text)