Context Sensitivity Improves Human-machine Visual Alignment
2026 · Frieda Born, Tom Neuhäuser, Lukas Muttenthaler, et al.
Abstract
Modern machine learning models typically represent inputs as fixed points in a high-dimensional embedding space. While this approach has been proven powerful for a wide range of downstream tasks, it fundamentally differs from the way humans process information. Because humans are constantly adapting to their environment, they represent objects and their relationships in a highly context-sensitive manner. To address this gap, we propose a method for context-sensitive similarity computation from neural network embeddings, applied to modeling a triplet odd-one-out task with an anchor image serving as simultaneous context. Modeling context enables us to achieve up to a 15% improvement in odd-one-out accuracy over a context-insensitive model. We find that this improvement is consistent across both original and "human-aligned" vision foundation models.
Authors
(none)
Tags
Stats
Related papers
- Learning What Helps: Task-aligned Context Selection For Vision Tasks (2025)0.00
- Modest-align: Data-efficient Alignment For Vision-language Models (2025)0.00
- Contextclip: Contextual Alignment Of Image-text Pairs On CLIP Visual Representations (2022)5.84
- Image Similarity Using An Ensemble Of Context-sensitive Models (2024)4.52
- Context-aware Embeddings For Automatic Art Analysis (2019)12.54
- Contextblip: Doubly Contextual Alignment For Contrastive Image Retrieval From Linguistically Complex Descriptions (2024)0.00
- Context-adaptive Multi-prompt Embedding With Large Language Models For Vision-language Alignment (2025)0.00
- Human-aligned Image Models Improve Visual Decoding From The Brain (2025)0.00