← all datasets

HELMET

Emerging
3papers using it
828HF downloads
9HF likes
2025first seen

HELMET: How to Evaluate Long-context Language Models Effectively and Thoroughly [Paper][Code] HELMET is a comprehensive benchmark for long-context language models covering seven diverse categories of tasks. The datasets are application-centric and are designed to evaluate models at different lengths and levels of compl

Papers using HELMET (3)

HELMET β€” datasets β€” llm-papers