Semantic-preserving Augmentation For Robust Image-text Retrieval
2023 Β· Sunwoo Kim, Kyuhong Shim, Luong Trung Nguyen, et al.
Abstract
Image text retrieval is a task to search for the proper textual descriptions of the visual world and vice versa. One challenge of this task is the vulnerability to input image and text corruptions. Such corruptions are often unobserved during the training, and degrade the retrieval model decision quality substantially. In this paper, we propose a novel image text retrieval technique, referred to as robust visual semantic embedding (RVSE), which consists of novel image-based and text-based augmentation techniques called semantic preserving augmentation for image (SPAugI) and text (SPAugT). Since SPAugI and SPAugT change the original data in a way that its semantic information is preserved, we enforce the feature extractors to generate semantic aware embedding vectors regardless of the corruption, improving the model robustness significantly. From extensive experiments using benchmark datasets, we show that RVSE outperforms conventional retrieval schemes in terms of image-text retrieval
Authors
(none)
Tags
Stats
Related papers
- Image-text Retrieval Via Preserving Main Semantics Of Vision (2023)10.22
- Benchmarking Robustness Of Text-image Composed Retrieval (2023)2.23
- Beyond Visual Semantics: Exploring The Role Of Scene Text In Image Understanding (2019)9.59
- Understanding Retrieval Robustness For Retrieval-augmented Image Captioning (2024)6.34
- Direction-oriented Visual-semantic Embedding Model For Remote Sensing Image-text Retrieval (2023)11.29
- Tsvc:tripartite Learning With Semantic Variation Consistency For Robust Image-text Retrieval (2025)3.58
- Benchmark Granularity And Model Robustness For Image-text Retrieval (2024)0.00
- Webly Supervised Joint Embedding For Cross-modal Image-text Retrieval (2018)13.17