Sokoban

Emerging

13papers using it

15HF downloads

0HF likes

2025first seen

The 'Sokoban' dataset/benchmark contains a series of puzzle scenarios used to evaluate the performance of reinforcement learning agents in solving multi-turn interactive tasks.

🤗 Hugging Face

Papers using Sokoban (13)

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs2025

Learning to Search and Searching to Learn for Generalization in Planning2026

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents2026

Freshness-Aware Prioritized Experience Replay for LLM/VLM Reinforcement Learning2026

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents2026

ProAct: Agentic Lookahead in Interactive Environments2026

TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents2026

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents2026

Meta-RL Induces Exploration in Language Agents2025

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents2025

Internalizing World Models via Self-Play Finetuning for Agentic RL2025

Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning2025

Interpreting Emergent Planning in Model-Free Reinforcement Learning2025