Learning What Helps: Task-aligned Context Selection For Vision Tasks
2025 Β· Jingyu Guo, Emir Konuk, Fredrik Strand, et al.
Abstract
Humans often resolve visual uncertainty by comparing an image with relevant examples, but ViTs lack the ability to identify which examples would improve their predictions. We present Task-Aligned Context Selection (TACS), a framework that learns to select paired examples which truly improve task performance rather than those that merely appear similar. TACS jointly trains a selector network with the task model through a hybrid optimization scheme combining gradient-based supervision and reinforcement learning, making retrieval part of the learning objective. By aligning selection with task rewards, TACS enables discriminative models to discover which contextual examples genuinely help. Across 18 datasets covering fine-grained recognition, medical image classification, and medical image segmentation, TACS consistently outperforms similarity-based retrieval, particularly in challenging or data-limited settings.
Authors
(none)
Tags
Stats
Related papers
- Context Sensitivity Improves Human-machine Visual Alignment (2026)0.00
- Curriculum Learning For Data-efficient Vision-language Alignment (2022)2.26
- VITR: Augmenting Vision Transformers With Relation-focused Learning For Cross-modal Information Retrieval (2023)4.52
- Tsvc:tripartite Learning With Semantic Variation Consistency For Robust Image-text Retrieval (2025)3.58
- Retrieving Counterfactuals Improves Visual In-context Learning (2026)2.20
- What Makes Good Examples For Visual In-context Learning? (2023)3.58
- HKUST At Semeval-2023 Task 1: Visual Word Sense Disambiguation With Context Augmentation And Visual Assistance (2023)0.00
- CAVL: Learning Contrastive And Adaptive Representations Of Vision And Language (2023)0.00