← all datasets

MATH

Emerging

29papers using it

2023first seen

The 'MATH' dataset is a benchmark containing 747,000 math problems used to evaluate the mathematical reasoning capabilities of language models.

🔎 Find this dataset

Papers using MATH (29)

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking2025 · 8 cites

Improved Large Language Diffusion Models2026

Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks2025

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees2025

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code2025

LLM Performance for Code Generation on Noisy Tasks2025

Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment2025

Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities2025

MathClean: A Benchmark for Synthetic Mathematical Data Cleaning2025

Progressive-Hint Prompting Improves Reasoning in Large Language Models2023 · 33 cites

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models2023 · 28 cites

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification2023 · 18 cites

Data Interpreter: An LLM Agent For Data Science2024 · 11 cites

Learning From Mistakes Makes LLM Better Reasoner2023 · 5 cites

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning2023 · 4 cites

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning2024 · 4 cites

CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models2023 · 3 cites

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset2024 · 3 cites

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning2024 · 3 cites

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs2024 · 2 cites

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning2024 · 2 cites

ReGAL: Refactoring Programs to Discover Generalizable Abstractions2024 · 1 cites

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation2024 · 1 cites

Building Math Agents with Multi-Turn Iterative Preference Learning2024 · 1 cites

MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline2024

Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning2024

ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning2024

UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts2024

InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning2024

MATH dataset — papers, benchmarks & downloads · AI for Code