LLaVA
Emerging10papers using it
2024first seen
LLaVA is a dataset/benchmark used to evaluate the performance of Multimodal Large Language Models (MLLMs) on vision-language tasks, focusing on their internal visual representations and interpretability.
Papers using LLaVA (10)
- Cascaded Sparse Autoencoders Learn Multi-Level Visual Concepts in Multimodal LLMsBehind Maya: Building A Multilingual Vision Language ModelDeepSight: Bridging Depth Maps and Language with a Depth-Driven Multimodal ModelHiPP-Prune: Hierarchical Preference-Conditioned Structured Pruning for Vision-Language ModelsSeeing Right but Saying Wrong: Inter- and Intra-Layer Refinement in MLLMs without TrainingGated Relational Alignment via Confidence-based Distillation for Efficient VLMsSafe-llava: A Privacy-preserving Vision-language Dataset And Benchmark For Biometric SafetyDeepInsert: Early Layer Bypass for Efficient and Performant Multimodal UnderstandingLVLM-Compress-Bench: Benchmarking the Broader Impact of Large
Vision-Language Model CompressionMaya: An Instruction Finetuned Multilingual Multimodal Model