Sokoban
Emerging12papers using it
3HF downloads
0HF likes
2025first seen
The 'Sokoban' dataset/benchmark contains a series of puzzle-like tasks used to evaluate the performance of reinforcement learning agents in solving complex, multi-turn interactive challenges.
Papers using Sokoban (12)
- Freshness-Aware Prioritized Experience Replay for LLM/VLM Reinforcement LearningTSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM AgentsPaying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM AgentsMeta-RL Induces Exploration in Language AgentsTurn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMsInternalizing World Models via Self-Play Finetuning for Agentic RLDyna-Mind: Learning to Simulate from Experience for Better AI AgentsProAct: Agentic Lookahead in Interactive EnvironmentsHiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM AgentsSkill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM AgentsInterpreting Emergent Planning in Model-Free Reinforcement LearningCogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning