HMMT 25
Emerging3papers using it
2025first seen
HMMT25 is a benchmark dataset used to evaluate the reasoning capabilities of Large Language Models (LLMs) by analyzing their performance in relation to model uncertainty and confidence dynamics during inference.