MMLU-Hard
Emerging2papers using it
2026first seen
'MMLU-Hard' is a high-difficulty benchmark used to evaluate the performance of language models in understanding and reasoning through complex tasks.
'MMLU-Hard' is a high-difficulty benchmark used to evaluate the performance of language models in understanding and reasoning through complex tasks.