Enhancing Multimodal Understanding With Clip-based Image-to-text Transformation | Awesome LLM Papers

Enhancing Multimodal Understanding With Clip-based Image-to-text Transformation

Chang Che, Qunwei Lin, Xinyu Zhao, Jiaxin Huang, Liqiang Yu · ICBDT 2023: 2023 6th International Conference on Big Data Technologies · 2024

Paper ↗

Scholar

Semantic

Uncategorized

The process of transforming input images into corresponding textual explanations stands as a crucial and complex endeavor within the domains of computer vision and natural language processing. In this paper, we propose an innovative ensemble approach that harnesses the capabilities of Contrastive Language-Image Pretraining models.

Similar Work

Loading…

Stay Updated

Similar Work