Abstract

Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (MUDG) problem, which uses a large task-agnostic unlabeled source dataset during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space. We make three contributions in the MUDG setting. Firstly, we show theoretically that cross-modal approximate nearest neig

Authors

(none)

Tags

  • Cross-Modal Hashing
  • Unsupervised Hashing
  • Supervised Hashing

Stats

  • citations0
  • S2 citationsβ€”
  • github stars7
  • HF likes0
  • heat score1.81
  • arxiv keyliao2024multimodal

Related papers