From Known To The Unknown: Transferring Knowledge To Answer Questions About Novel Visual And Semantic Concepts
2018 Β· Moshiur R Farazi, Salman H Khan, Nick Barnes
Abstract
Current Visual Question Answering (VQA) systems can answer intelligent questions about `Known' visual content. However, their performance drops significantly when questions about visually and linguistically `Unknown' concepts are presented during inference (`Open-world' scenario). A practical VQA system should be able to deal with novel concepts in real world settings. To address this problem, we propose an exemplar-based approach that transfers learning (i.e., knowledge) from previously `Known' concepts to answer questions about the `Unknown'. We learn a highly discriminative joint embedding space, where visual and semantic features are fused to give a unified representation. Once novel concepts are presented to the model, it looks for the closest match from an exemplar set in the joint embedding space. This auxiliary information is used alongside the given Image-Question pair to refine visual attention in a hierarchical fashion. Since handling the high dimensional exemplars on large
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Retrieval For Knowledge-based Visual Question Answering (2024)7.81
- Detect, Describe, Discriminate: Moving Beyond VQA For MLLM Evaluation (2024)0.00
- Fine-grained Late-interaction Multi-modal Retrieval For Retrieval Augmented Visual Question Answering (2023)5.24
- A Symmetric Dual Encoding Dense Retrieval Framework For Knowledge-intensive Visual Question Answering (2023)9.92
- Leveraging Visual Question Answering For Image-caption Ranking (2016)12.10
- Pre-training Multi-modal Dense Retrievers For Outside-knowledge Visual Question Answering (2023)7.50
- VQA4CIR: Boosting Composed Image Retrieval With Visual Question Answering (2023)5.24
- Object Retrieval For Visual Question Answering With Outside Knowledge (2024)0.00