Look Before You Leap: Improving Text-based Person Retrieval By Learning A Consistent Cross-modal Common Manifold
2022 Β· Zijie Wang, Aichun Zhu, Jingyi Xue, et al.
Abstract
The core problem of text-based person retrieval is how to bridge the heterogeneous gap between multi-modal data. Many previous approaches contrive to learning a latent common manifold mapping paradigm following a \textbf\{cross-modal distribution consensus prediction (CDCP)\} manner. When mapping features from distribution of one certain modality into the common manifold, feature distribution of the opposite modality is completely invisible. That is to say, how to achieve a cross-modal distribution consensus so as to embed and align the multi-modal features in a constructed cross-modal common manifold all depends on the experience of the model itself, instead of the actual situation. With such methods, it is inevitable that the multi-modal data can not be well aligned in the common manifold, which finally leads to a sub-optimal retrieval performance. To overcome this \textbf\{CDCP dilemma\}, we propose a novel algorithm termed LBUL to learn a Consistent Cross-modal Common Manifold (C\(
Authors
(none)
Tags
Stats
Related papers
- Improving Text-based Person Search Via Part-level Cross-modal Correspondence (2024)0.00
- Cross-modal Manifold Learning For Cross-modal Retrieval (2016)0.00
- CPCL: Cross-modal Prototypical Contrastive Learning For Weakly Supervised Text-based Person Retrieval (2024)0.00
- Multi-path Exploration And Feedback Adjustment For Text-to-image Person Retrieval (2024)0.00
- See Finer, See More: Implicit Modality Alignment For Text-based Person Retrieval (2022)18.39
- Decoupled Cross-modal Alignment Network For Text-rgbt Person Retrieval And A High-quality Benchmark (2025)0.00
- Using Text To Teach Image Retrieval (2020)5.24
- Beat: Bi-directional One-to-many Embedding Alignment For Text-based Person Retrieval (2024)10.85