Evaluating Perspectival Biases In Cross-modal Retrieval
2025 Β· Teerapol Saengsukhiran, Peerawat Chomphooyod, Narabodee Rodjananant, et al.
Abstract
Multimodal retrieval systems are expected to operate in a semantic space, agnostic to the language or cultural origin of the query. In practice, however, retrieval outcomes systematically reflect perspectival biases: deviations shaped by linguistic prevalence and cultural associations. We introduce the Cross-Cultural, Cross-Modal, Cross-lingual Multimodal (3XCM) benchmark to isolate these effects. Results from our studies indicate that, for image-to-text retrieval, models tend to favor entries from prevalent languages over those that are semantically faithful. For text-to-image retrieval, we observe a consistent "tugging effect" in the joint embedding space between semantic alignment and language-conditioned cultural association. When semantic representations are insufficiently resolved, particularly in low-resource languages, similarity is increasingly governed by culturally familiar visual patterns, leading to systematic association bias in retrieval. Our findings suggest that achiev
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Retrieval: A Systematic Review Of Methods And Future Directions (2023)12.81
- Scene-centric Vs. Object-centric Image-text Cross-modal Retrieval: A Reproducibility Study (2023)5.24
- Preserving Semantic Neighborhoods For Robust Cross-modal Retrieval (2020)10.07
- Continual Learning In Cross-modal Retrieval (2021)9.41
- Revisiting Cross Modal Retrieval (2018)0.00
- Multimodal Representation Alignment For Cross-modal Information Retrieval (2025)0.00
- Discriminative Semantic Transitive Consistency For Cross-modal Learning (2021)0.00
- Cross-modal Coordination Across A Diverse Set Of Input Modalities (2024)0.00