HELMET
Emerging3papers using it
828HF downloads
9HF likes
2025first seen
HELMET: How to Evaluate Long-context Language Models Effectively and Thoroughly [Paper][Code] HELMET is a comprehensive benchmark for long-context language models covering seven diverse categories of tasks. The datasets are application-centric and are designed to evaluate models at different lengths and levels of compl