BABILong

Emerging

3papers using it

7,169HF downloads

19HF likes

2025first seen

BABILong (100 samples) : a long-context needle-in-a-haystack benchmark for LLMs Preprint is on arXiv and code for LLM evaluation is available on GitHub. BABILong Leaderboard with top-performing long-context models. bAbI + Books = BABILong BABILong is a novel generative benchmark for evaluating the performance of NLP mo

🤗 Hugging Face

Papers using BABILong (3)

AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation2025 · 1 cites

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning2026

LM2: Large Memory Models2025