Generating Question Relevant Captions To Aid Visual Question Answering | Awesome LLM Papers

Generating Question Relevant Captions To Aid Visual Question Answering

Jialin Wu, Zeyuan Hu, Raymond J. Mooney Β· Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics Β· 2019

Visual question answering (VQA) and image captioning require a shared body of general knowledge connecting language and vision. We present a novel approach to improve VQA performance that exploits this connection by jointly generating captions that are targeted to help answer a specific visual question. The model is trained using an existing caption dataset by automatically determining question-relevant captions using an online gradient-based method. Experimental results on the VQA v2 challenge demonstrates that our approach obtains state-of-the-art VQA performance (e.g. 68.4% on the Test-standard set using a single model) by simultaneously generating question-relevant captions.

Similar Work
Loading…