Abstract

Existing text-based person retrieval datasets often have relatively coarse-grained text annotations. This hinders the model to comprehend the fine-grained semantics of query texts in real scenarios. To address this problem, we contribute a new benchmark named \textbf\{UFineBench\} for text-based person retrieval with ultra-fine granularity. Firstly, we construct a new \textbf\{dataset\} named UFine6926. We collect a large number of person images and manually annotate each image with two detailed textual descriptions, averaging 80.8 words each. The average word count is three to four times that of the previous datasets. In addition of standard in-domain evaluation, we also propose a special \textbf\{evaluation paradigm\} more representative of real scenarios. It contains a new evaluation set with cross domains, cross textual granularity and cross textual styles, named UFine3C, and a new evaluation metric for accurately measuring retrieval ability, named mean Similarity Distribution (m

Authors

(none)

Tags

  • Image Retrieval

Stats

  • citations47
  • S2 citationsβ€”
  • github stars81
  • HF likes0
  • heat score16.44
  • arxiv keyzuo2023ufinebench

Related papers