AIME-24/25

Emerging

4papers using it

2025first seen

The 'AIME-24/25' dataset/benchmark contains a collection of tasks designed to evaluate the performance and robustness of reinforcement learning algorithms, particularly in the context of agentic problem-solving with Large Language Models.

🔎 Find this dataset

Papers using AIME-24/25 (4)

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play2026

CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning2026

TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning2025

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning2025