Look Before You Leap: Improving Text-based Person Retrieval By Learning A Consistent Cross-modal Common Manifold

Abstract

The core problem of text-based person retrieval is how to bridge the heterogeneous gap between multi-modal data. Many previous approaches contrive to learning a latent common manifold mapping paradigm following a \textbf\{cross-modal distribution consensus prediction (CDCP)\} manner. When mapping features from distribution of one certain modality into the common manifold, feature distribution of the opposite modality is completely invisible. That is to say, how to achieve a cross-modal distribution consensus so as to embed and align the multi-modal features in a constructed cross-modal common manifold all depends on the experience of the model itself, instead of the actual situation. With such methods, it is inevitable that the multi-modal data can not be well aligned in the common manifold, which finally leads to a sub-optimal retrieval performance. To overcome this \textbf\{CDCP dilemma\}, we propose a novel algorithm termed LBUL to learn a Consistent Cross-modal Common Manifold (C\(

Look Before You Leap: Improving Text-based Person Retrieval By Learning A Consistent Cross-modal Common Manifold

Abstract

Authors

Tags

Stats

Related papers