Efficient Multi-turn RL For GUI Agents Via Decoupled Training And Adaptive Data Curation
2025 Β· Pengxiang Li, Zechen Hu, Zirui Shang, et al.
Abstract
Vision-language model (VLM) based GUI agents show promise for automating complex desktop and mobile tasks, but face significant challenges in applying reinforcement learning (RL): (1) slow multi-turn interactions with GUI environments for policy rollout, and (2) insufficient high-quality agent-environment interactions for policy learning. To address these challenges, we propose DART, a Decoupled Agentic RL Training framework for GUI agents, which coordinates heterogeneous modules in a highly decoupled manner. DART separates the training system into four asynchronous modules: environment cluster, rollout service, data manager, and trainer. This design enables non-blocking communication, asynchronous training, rollout-wise trajectory sampling, and per-worker model synchronization, significantly improving the system efficiency: 1.6*GPU utilization for rollout, 1.9* training throughput, and 5.5* environment utilization. To facilitate effective learning from abundant samples, we introduce a
Authors
(none)
Tags
Stats
Related papers
- Enhancing Vision-language Model Training With Reinforcement Learning In Synthetic Worlds For Real-world Success (2025)0.00
- Mobile-r1: Towards Interactive Capability For Vlm-based Mobile Agent Via Systematic Training (2026)0.00
- Computerrl: Scaling End-to-end Online Reinforcement Learning For Computer Use Agents (2025)0.00
- CRAFT-GUI: Curriculum-reinforced Agent For GUI Tasks (2025)0.00
- Userrl: Training Interactive User-centric Agent Via Reinforcement Learning (2025)0.00
- A New Framework For Multi-agent Reinforcement Learning -- Centralized Training And Exploration With Decentralized Execution Via Policy Distillation (2019)0.00
- Distrl: An Asynchronous Distributed Reinforcement Learning Framework For On-device Control Agents (2024)0.00
- Digi-q: Learning Q-value Functions For Training Device-control Agents (2025)0.00