Connecting Language And Vision For Natural Language-based Vehicle Retrieval
2021 Β· Shuai Bai, Zhedong Zheng, Xiaohan Wang, et al.
Abstract
Vehicle search is one basic task for the efficient traffic management in terms of the AI City. Most existing practices focus on the image-based vehicle matching, including vehicle re-identification and vehicle tracking. In this paper, we apply one new modality, i.e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario. The natural language-based vehicle search poses one new challenge of fine-grained understanding of both vision and language modalities. To connect language and vision, we propose to jointly train the state-of-the-art vision models with the transformer-based language model in an end-to-end manner. Except for the network structure design and the training strategy, several optimization objectives are also re-visited in this work. The qualitative and quantitative experiments verify the effectiveness of the proposed method. Our proposed method has achieved the 1st place on the 5th AI City Challenge, yie
Authors
(none)
Tags
Stats
Related papers
- All You Can Embed: Natural Language Based Vehicle Retrieval With Spatio-temporal Transformers (2021)9.19
- Symmetric Network With Spatial Relationship Modeling For Natural Language-based Vehicle Retrieval (2022)11.26
- Findvehicle And Vehiclefinder: A NER Dataset For Natural Language-based Vehicle Retrieval And A Keyword-based Cross-modal Vehicle Retrieval System (2023)10.38
- The Solution For The CVPR 2023 1st Foundation Model Challenge-track2 (2024)0.00
- Vldeformer: Vision-language Decomposed Transformer For Fast Cross-modal Retrieval (2021)10.21
- BEV-TSR: Text-scene Retrieval In BEV Space For Autonomous Driving (2024)6.34
- V\(^2\)L: Leveraging Vision And Vision-language Models Into Large-scale Product Retrieval (2022)0.00
- Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models And Vision Language Models (2024)8.82