MATH
Emerging37papers using it
2024first seen
The 'MATH' dataset is a benchmark that contains a collection of mathematical problems used to evaluate the performance of models in solving complex reasoning tasks.
Papers using MATH (37)
- Complementing reinforcement learning with SFT through logit averaging in the post training of LLMsEnhancing Multi-Step Reasoning Abilities of Language Models through
Direct Q-Function OptimizationWarm Up Before You Train: Unlocking General Reasoning in Resource-Constrained SettingsEntropy-Regularized Process Reward ModelDiscovering Process-Outcome Credit in Multi-Step LLM ReasoningTMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFTTRE: Encouraging Exploration in the Trust RegionWhy GRPO Needs Normalization: A Local-Curvature Perspective on Adaptive GradientsGroup-Aware Reinforcement Learning for Output Diversity in Large Language ModelsPlan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM ReasoningPinpointing crucial steps: Attribution-based Credit Assignment for Verifiable Reinforcement LearningGIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNAIt's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RLPutting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With VerifiersTapered Off-Policy REINFORCE: Stable and efficient reinforcement
learning for LLMsQwen2.5-Math Technical Report: Toward Mathematical Expert Model via
Self-ImprovementOpenR: An Open Source Framework for Advanced Reasoning with Large
Language ModelsVinePPO: Refining Credit Assignment in RL Training of LLMsFree Process Rewards without Process LabelsOffline Reinforcement Learning for LLM Multi-Step ReasoningCPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning
TasksBig-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement
Learning in Language ModelsTutorGym: A Testbed for Evaluating AI Agents as Tutors and StudentsRL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement LearningKnow When to Explore: Difficulty-Aware Certainty as a Guide for LLM Reinforcement LearningWirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement LearningPrompt Curriculum Learning for Efficient LLM Post-TrainingDon't Waste Mistakes: Leveraging Negative RL-Groups via Confidence ReweightingDifferentiable Evolutionary Reinforcement LearningETR: Outcome-Guided Elastic Trust Regions for Policy OptimizationEntropy-Gated Selective Policy Optimization:Token-Level Gradient Allocation for Hybrid Training of Large Language ModelsLearning Adaptive LLM DecodingMining Intrinsic Rewards from LLM Hidden States for Efficient Best-of-N SamplingSynthetic Data RL: Task Definition Is All You NeedConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning ModelsRL for Reasoning by Adaptively Revealing RationalesMASPRM: Multi-Agent System Process Reward Model