AMC-23
Emerging11papers using it
5,846HF downloads
1HF likes
2024first seen
The 'AMC23' dataset/benchmark is used to evaluate reinforcement learning models' ability to perform multi-step reasoning while managing the trade-off between efficiency and accuracy in their responses.
Papers using AMC-23 (11)
- Nemotron-CrossThink: Scaling Self-Learning beyond Math ReasoningSqueeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language ModelThink Dense, Not Long: Dynamic Decoupled Conditional Advantage for Efficient ReasoningBeyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional PairingLong Chain-of-Thought Compression via Fine-Grained Group Policy OptimizationMasked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable RewardsConfidence Is All You Need: Few-Shot RL Fine-Tuning of Language ModelsReinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn'tQwen2.5-Math Technical Report: Toward Mathematical Expert Model via
Self-Improvement$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative DraftsStop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning