← all datasets

HMMT 25

Emerging
3papers using it
2025first seen

HMMT25 is a benchmark dataset used to evaluate the reasoning capabilities of Large Language Models (LLMs) by analyzing their performance in relation to model uncertainty and confidence dynamics during inference.

Papers using HMMT 25 (3)

HMMT 25 β€” datasets β€” llm-papers