GSM8K
Emerging25papers using it
2024first seen
The GSM8K dataset is a benchmark that contains complex mathematical reasoning problems used to evaluate the reasoning abilities of large language models.
Papers using GSM8K (25)
- Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for
Large Language ModelsSPEAR: Code-Augmented Agentic Prompt OptimizationLayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language ModelsTesting LLM Arithmetic Reasoning Generalization with Automatic Numeric-Remapping AttacksCritic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem SolvingARMOR-MAD: Adaptive Routing for Heterogeneous Multi-Agent Debate in Large Language Model ReasoningTCP-MCP: Landscape-Guided Co-Evolution of Prompts and Communication Topologies for Multi-Agent SystemsWhen Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level ValidationReflection-Enhanced Meta-Optimization Integrating TextGrad-style Prompt Optimization with Memory-Driven Self-EvolutionP3: Prompts Promote PromptingGEMMAS: Graph-based Evaluation Metrics For Multi Agent SystemsSequential Consensus for Multi-Agent LLM Debates: A Wald-SPRT compute governor with calibration-based failure detectionQkvshare: Quantized Kv-cache Handoff For Multi-agent On-device LlmsReasoning Topology Matters: Network-of-thought For Complex Reasoning TasksCROP: Token-efficient Reasoning In Large Language Models Via Regularized Prompt OptimizationGuided Collaboration in Heterogeneous LLM-Based Multi-Agent Systems via Entropy-Based Understanding Assessment and Experience RetrievalPrototype-Based Dynamic Steering for Large Language ModelsHyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent CommunicationAgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedTraining Large Language Models to Reason via EM Policy GradientCan Large Language Models Invent Algorithms To Improve Themselves?: Algorithm Discovery For Recursive Self-improvement Through Reinforcement LearningPrompt Selection And Augmentation For Few Examples Code Generation In Large Language Model And Its Application In Robotics ControlQ*: Improving Multi-step Reasoning for LLMs with Deliberative PlanningThink Beyond Size: Adaptive Prompting for More Effective ReasoningCPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning
Tasks