GSM8K
Emerging37papers using it
2022first seen
GSM8K is a benchmark dataset that contains mathematical reasoning problems used to evaluate the performance of language models on complex reasoning tasks.
Papers using GSM8K (37)
- Pay for Hints, Not Answers: LLM Shepherding for Cost-Efficient InferenceShape of Thought: When Distribution Matters More than Correctness in Reasoning TasksRewriting Pre-Training Data Boosts LLM Performance in Math and CodeReasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language
Models Through Logic Unit AlignmentMathClean: A Benchmark for Synthetic Mathematical Data CleaningPAL: Program-aided Language ModelsProgressive-Hint Prompting Improves Reasoning in Large Language ModelsMetaMath: Bootstrap Your Own Mathematical Questions for Large Language
ModelsOrca-Math: Unlocking the potential of SLMs in Grade School MathLearning From Mistakes Makes LLM Better ReasonerMathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
ReasoningOpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetDotaMath: Decomposition of Thought with Code Assistance and
Self-correction for Mathematical ReasoningInternLM-Math: Open Math Large Language Models Toward Verifiable
ReasoningAutomatic Model Selection with Large Language Models for ReasoningMathGenie: Generating Synthetic Data with Question Back-translation for
Enhancing Mathematical Reasoning of LLMsMuMath-Code: Combining Tool-Use Large Language Models with
Multi-perspective Data Augmentation for Mathematical ReasoningLogicPro: Improving Complex Logical Reasoning via Program-Guided LearningPMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuningTinyGSM: achieving >80% on GSM8k with small language modelsBuilding Math Agents with Multi-Turn Iterative Preference LearningExplicit Knowledge Transfer for Weakly-Supervised Code GenerationExploring Equation as a Better Intermediate Meaning Representation for
Numerical ReasoningAskIt: Unified Programming Interface for Programming with Large Language
ModelsMARIO: MAth Reasoning with code Interpreter Output -- A Reproducible
PipelinePrompt Selection and Augmentation for Few Examples Code Generation in
Large Language Model and its Application in Robotics ControlCan LLMs Reason in the Wild with Programs?Not All Votes Count! Programs as Verifiers Improve Self-Consistency of
Language Models for Math ReasoningCan Large Language Models Invent Algorithms to Improve Themselves?: Algorithm Discovery for Recursive Self-Improvement through Reinforcement LearningReasonAgain: Using Extractable Symbolic Programs to Evaluate
Mathematical ReasoningUTMath: Math Evaluation with Unit Test via Reasoning-to-Coding ThoughtsInfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic
Mathematical ReasoningMathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
ReasoningOpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetOrca-Math: Unlocking the potential of SLMs in Grade School MathDotaMath: Decomposition of Thought with Code Assistance and
Self-correction for Mathematical ReasoningInfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic
Mathematical Reasoning