← all datasets

LLaVA

Emerging

10papers using it

2024first seen

LLaVA is a dataset/benchmark used to evaluate the performance of Multimodal Large Language Models (MLLMs) on vision-language tasks, focusing on their internal visual representations and interpretability.

🔎 Find this dataset

Papers using LLaVA (10)

Cascaded Sparse Autoencoders Learn Multi-Level Visual Concepts in Multimodal LLMs2026

Behind Maya: Building A Multilingual Vision Language Model2025

DeepSight: Bridging Depth Maps and Language with a Depth-Driven Multimodal Model2026

HiPP-Prune: Hierarchical Preference-Conditioned Structured Pruning for Vision-Language Models2026

Seeing Right but Saying Wrong: Inter- and Intra-Layer Refinement in MLLMs without Training2026

Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs2026

Safe-llava: A Privacy-preserving Vision-language Dataset And Benchmark For Biometric Safety2025

DeepInsert: Early Layer Bypass for Efficient and Performant Multimodal Understanding2025

LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression2025

Maya: An Instruction Finetuned Multilingual Multimodal Model2024

LLaVA — datasets — multimodal