← all datasets

MMLU-Hard

Emerging
2papers using it
2026first seen

'MMLU-Hard' is a high-difficulty benchmark used to evaluate the performance of language models in understanding and reasoning through complex tasks.

Papers using MMLU-Hard (2)

MMLU-Hard β€” datasets β€” ai-agents