HMMT 25

Emerging

3papers using it

2025first seen

HMMT25 is a benchmark dataset used to evaluate the reasoning capabilities of Large Language Models (LLMs) by analyzing their performance in relation to model uncertainty and confidence dynamics during inference.

🔎 Find this dataset

Papers using HMMT 25 (3)

Inference Time Optimization with Confidence Dynamics2026

Chronos: Learning Temporal Dynamics of Reasoning Chains for Test-Time Scaling2026

PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning2025