Unimodal and Cross-Modal Hashing Datasets
Unimodal Datasets:
For
unimodal experiments (query and database are in the same feature space e.g. images), there are six popular and freely available image datasets: LabelMe, CIFAR-10, NUS-WIDE, MNIST, SIFT1M and ImageNet. The datasets are of
widely varying size (22,019-1.3 million images), are represented by an array of different
feature descriptors (from GIST, SIFT, RGB pixels to bag of visual words) and cover a diverse
range of different image topics from natural scenes to personal photos, logos and drawings.
Cross-modal Datasets:
Cross-modal retrieval experiments (query and database can be in different feature spaces e.g. image and text) are typically conducted on the `Wiki' dataset, Microsoft COCO and NUSWIDE datasets. All datasets come with images and associated
paired textual descriptors, a key requirement for training and evaluating a cross-modal
retrieval model.
| Name | Dataset | Modality | Size | Features |
| , . |
CIFAR10 |
Image |
60000 |
512 dimensional GIST |
| , . |
MS-COCO |
Image/Text |
87783 |
RGB pixels (image) - 5 sentences per image (text) |
| , . |
ImageNet |
Image |
1331167 |
4096 dimensional CNN |
| , . |
LabelMe |
Image |
22019 |
512 dimensional GIST |
| , . |
MIR-FLICKR25K |
Image/Text |
25000 |
RGB pixels (image) - 38 categories 1386 tags (text) |
| , . |
MNIST |
Image |
70000 |
Grayscale Pixels |
| , . |
NUSWIDE |
Image/Text |
269648 |
500 dimensional BoW (image) - 5018 dimensional tags (text) |
| , . |
SIFT1M |
Image |
1000000 |
SIFT |
| , . |
TINY100K |
Image |
100000 |
384 dimensional GIST |
| , . |
WIKI |
Image/Text |
2669 |
128 dimensional SIFT (image) - 10 dimensional LDA topics (text) |