Improving Text-based Person Search Via Part-level Cross-modal Correspondence
2024 Β· Jicheol Park, Boseung Jeong, Dongwon Kim, et al.
Abstract
Text-based person search is the task of finding person images that are the most relevant to the natural language text description given as query. The main challenge of this task is a large gap between the target images and text queries, which makes it difficult to establish correspondence and distinguish subtle differences across people. To address this challenge, we introduce an efficient encoder-decoder model that extracts coarse-to-fine embedding vectors which are semantically aligned across the two modalities without supervision for the alignment. There is another challenge of learning to capture fine-grained information with only person IDs as supervision, where similar body parts of different individuals are considered different due to the lack of part-level supervision. To tackle this, we propose a novel ranking loss, dubbed commonality-based margin ranking loss, which quantifies the degree of commonality of each body part and reflects it during the learning of fine-grained body
Authors
(none)
Tags
Stats
Related papers
- TIPCB: A Simple But Effective Part-based Convolutional Baseline For Text-based Person Search (2021)20.24
- Enhancing Visual Representation For Text-based Person Searching (2024)1.69
- Multi-path Exploration And Feedback Adjustment For Text-to-image Person Retrieval (2024)0.00
- Look Before You Leap: Improving Text-based Person Retrieval By Learning A Consistent Cross-modal Common Manifold (2022)15.34
- Beat: Bi-directional One-to-many Embedding Alignment For Text-based Person Retrieval (2024)10.85
- Text-based Person Search With Limited Data (2021)15.69
- Contrastive Transformer Learning With Proximity Data Generation For Text-based Person Search (2023)11.88
- Decoupled Cross-modal Alignment Network For Text-rgbt Person Retrieval And A High-quality Benchmark (2025)0.00