← all datasets

AIME-24/25

Emerging
7papers using it
2025first seen

The AIME24/25 dataset/benchmark is used to evaluate the reasoning capabilities of diffusion large language models (dLLMs) in generating high-quality outputs while balancing exploration and quality during token decoding.

Papers using AIME-24/25 (7)

AIME-24/25 β€” datasets β€” llm-papers