BABILong
Emerging3papers using it
7,169HF downloads
19HF likes
2025first seen
BABILong (100 samples) : a long-context needle-in-a-haystack benchmark for LLMs Preprint is on arXiv and code for LLM evaluation is available on GitHub. BABILong Leaderboard with top-performing long-context models. bAbI + Books = BABILong BABILong is a novel generative benchmark for evaluating the performance of NLP mo