← all datasets

AMC-23

Emerging

13papers using it

2024first seen

The 'AMC-23' dataset/benchmark is used to evaluate the performance of large language models in reasoning tasks.

🔎 Find this dataset

Papers using AMC-23 (13)

Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning2025 · 24 cites

Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model2025 · 10 cites

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play2026

Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training2026

Think Dense, Not Long: Dynamic Decoupled Conditional Advantage for Efficient Reasoning2026

Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing2026

Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization2026

Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards2025

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models2025

$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts2025

Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning2025

Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't2025

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement2024 · 13 cites

AMC-23 dataset — papers, benchmarks & downloads · Reinforcement Learning