Humanity's Last Exam
Emerging3papers using it
2025first seen
'Humanity's Last Exam' is a benchmark dataset used to evaluate the capabilities of evolving agents in scientific inquiry and experimentation.
'Humanity's Last Exam' is a benchmark dataset used to evaluate the capabilities of evolving agents in scientific inquiry and experimentation.