MATH-500
Emerging50papers using it
182HF downloads
9HF likes
2025first seen
https://github.com/openai/prm800k/blob/main/prm800k/math_splits/test.jsonl
Papers using MATH-500 (50)
- Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent SpaceStep-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
FeedbackMerlin's Whisper: Enabling Efficient Reasoning in Large Language Models via Black-box Persuasive PromptingVTC-R1: Vision-Text Compression for Efficient Long-Context ReasoningLocally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language ModelsKVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning TasksDeCoVec: Building Decoding Space based Task Vector for Large Language Models via In-Context LearningTowards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought CompressionAligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMsLong Chain-of-Thought Compression via Fine-Grained Group Policy OptimizationReevaluating Self-Consistency Scaling in Multi-Agent SystemsPrompting Test-Time Scaling Is A Strong LLM Reasoning Data AugmentationFrom Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMsMemLens: Uncovering Memorization in LLMs with Activation TrajectoriesMoL-RL: Distilling Multi-Step Environmental Feedback into LLMs for Feedback-Independent ReasoningFast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length PenaltySteering LLM Thinking with Budget GuidanceKimi k1.5: Scaling Reinforcement Learning with LLMsPairwise RM: Perform Best-of-N Sampling with Knockout TournamentCan 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
ScalingThinking Preference OptimizationSIFT: Grounding LLM Reasoning in Contexts via StickersThink Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
ThinkingSample, Don't Search: Rethinking Test-Time Alignment for Language ModelsT1: Tool-integrated Self-verification for Test-time Compute Scaling in
Small Language ModelsSkywork R1V: Pioneering Multimodal Reasoning with Chain-of-ThoughtDianJin-R1: Evaluating and Enhancing Financial Reasoning in Large
Language ModelsProcess Reward Models That ThinkReinforcement Learning for Reasoning in Large Language Models with One
Training ExamplePhi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language
Models in MathSeek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient
in Latent SpaceThinkless: LLM Learns When to ThinkNot All Correct Answers Are Equal: Why Your Distillation Source MattersPreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and
RetrievalHarnessing Negative Signals: Reinforcement Distillation from Teacher
Data for LLM ReasoningConfidence Is All You Need: Few-Shot RL Fine-Tuning of Language ModelsFractional Reasoning via Latent Steering Vectors Improves Inference Time
ComputeReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMsAgentar-Fin-R1: Enhancing Financial Intelligence through Domain
Expertise, Training Efficiency, and Advanced ReasoningInpainting-Guided Policy Optimization for Diffusion Large Language
ModelsScaleDiff: Scaling Difficult Problems for Advanced Mathematical
ReasoningSocratic-Zero : Bootstrapping Reasoning via Data-Free Agent Co-evolutionQeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMsReasoning with Sampling: Your Base Model is Smarter Than You ThinkWhen to Ensemble: Identifying Token-Level Points for Stable and Fast LLM
EnsemblingCan LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM ReasoningFaster and Better LLMs via Latency-Aware Test-Time ScalingWalk Before You Run! Concise LLM Reasoning via Reinforcement LearningAdaptive Rectification Sampling for Test-Time Compute ScalingControlling Large Language Model with Latent Actions