Odysseus: Scaling Vlms To 100+ Turn Decision-making In Games Via Reinforcement Learning
2026 Β· Chengshuai Shi, Wenzhe Li, Xinran Liang, et al.
Abstract
arXiv:2605.00347v1 Announce Type: new Abstract: Given the rapidly growing capabilities of vision-language models (VLMs), extending them to interactive decision-making tasks such as video games has emerged as a promising frontier. However, existing approaches either rely on large-scale supervised fine-tuning (SFT) on human trajectories or apply reinforcement learning (RL) only in relatively short-horizon settings (typically around 20--30 turns). In this work, we study RL-based training of VLMs for long-horizon decision-making in Super Mario Land, a visually grounded environment requiring 100+ turns of interaction with coordinated perception, reasoning, and action. We begin with a systematic investigation of key algorithmic components and propose an adapted variant of PPO with a lightweight turn-level critic, which substantially improves training stability and sample efficiency over critic-free methods such as GRPO and Reinforce++. We further show that pretrained VLMs provide strong act
Authors
(none)
Tags
Stats
Related papers
- Enhancing Vision-language Model Training With Reinforcement Learning In Synthetic Worlds For Real-world Success (2025)0.00
- Diagnosing And Exploiting The Computational Demands Of Videos Games For Deep Reinforcement Learning (2023)0.00
- Think In Games: Learning To Reason In Games Via Reinforcement Learning With Large Language Models (2025)0.00
- Fast Exploration With Simplified Models And Approximately Optimistic Planning In Model Based Reinforcement Learning (2018)0.00
- A Survey Of Deep Reinforcement Learning In Video Games (2019)0.00
- True Knowledge Comes From Practice: Aligning Llms With Embodied Environments Via Reinforcement Learning (2024)0.00
- A Survey On Enhancing Reinforcement Learning In Complex Environments: Insights From Human And LLM Feedback (2024)0.00
- Backplay: "man Muss Immer Umkehren" (2018)0.00