GSM8K
Emerging6papers using it
2024first seen
GSM8K is a benchmark dataset that contains a collection of 8,000 diverse mathematical word problems used to evaluate language reasoning capabilities in models.
Papers using GSM8K (6)
- Dynin-Omni: Omnimodal Unified Large Diffusion Language ModelReading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMsThinking with Video: Video Generation as a Promising Multimodal Reasoning ParadigmGSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems In Visual ContextsThink or Not? Selective Reasoning via Reinforcement Learning for Vision-Language ModelsSelf-Imagine: Effective Unimodal Reasoning with Multimodal Models using
Self-Imagination