Computerrl: Scaling End-to-end Online Reinforcement Learning For Computer Use Agents
2025 Β· Hanyu Lai, Xiao Liu, Yanxiao Zhao, et al.
Abstract
We introduce ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully. ComputerRL features the API-GUI paradigm, which unifies programmatic API calls and direct GUI interaction to address the inherent mismatch between machine agents and human-centric desktop environments. Scaling end-to-end RL training is crucial for improvement and generalization across diverse desktop tasks; however, it remains challenging due to environmental inefficiency and instability during extended training. To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale online RL. Furthermore, we propose Entropulse, a training strategy that alternates reinforcement learning with supervised fine-tuning, effectively mitigating entropy collapse during extended training runs. We employ ComputerRL on open models GLM-4-9B-041
Authors
(none)
Tags
Stats
Related papers
- Userrl: Training Interactive User-centric Agent Via Reinforcement Learning (2025)0.00
- Efficient Multi-turn RL For GUI Agents Via Decoupled Training And Adaptive Data Curation (2025)0.00
- Distrl: An Asynchronous Distributed Reinforcement Learning Framework For On-device Control Agents (2024)0.00
- The AI Arena: A Framework For Distributed Multi-agent Reinforcement Learning (2021)0.00
- Expressive Value Learning For Scalable Offline Reinforcement Learning (2025)0.00
- Human-inspired Framework To Accelerate Reinforcement Learning (2023)0.00
- The Art Of Scaling Reinforcement Learning Compute For Llms (2025)1.57
- SRL: Scaling Distributed Reinforcement Learning To Over Ten Thousand Cores (2023)0.00