Countdown

Emerging

8papers using it

2025first seen

The 'Countdown' dataset is a benchmark used to evaluate the performance of models on numerical reasoning tasks, specifically in the context of solving arithmetic problems.

🔎 Find this dataset

Papers using Countdown (8)

Discrete Tilt Matching2026

Escaping the Verifier: Learning to Reason via Demonstrations2025

HPO: Hysteretic Policy Optimization for Stable and Efficient Training under Sparse-Reward Regime2026

TRE: Encouraging Exploration in the Trust Region2026

How Does RL Post-training Induce Skill Composition? A Case Study on Countdown2025

Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective2025

RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs2025

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning2025