Userrl: Training Interactive User-centric Agent Via Reinforcement Learning
2025 Β· Cheng Qian, Zuxin Liu, Akshara Prabhakar, et al.
Abstract
Reinforcement learning (RL) has shown promise in training agentic models that move beyond static benchmarks to engage in dynamic, multi-turn interactions. Yet, the ultimate value of such agents lies in their ability to assist users, a setting where diversity and dynamics of user interaction pose challenges. In this work, we propose UserRL, a unified framework for training and evaluating user-centric abilities through standardized gym environments paired with simulated users. We systematically vary turn-level reward assignment and trajectory-level score calculation to analyze how different formulations affect learning under the GRPO algorithm. Our experiments across Qwen3 models reveal three key findings: (i) SFT cold start is critical for unlocking initial interaction ability and enabling sustained RL improvements; (ii) deliberate trajectory scoring yields more efficient and effective multi-turn interactions; and (iii) while stronger simulated users (e.g., GPT-4o) facilitates training,
Authors
(none)
Tags
Stats
Related papers
- Improving Multimodal Interactive Agents With Reinforcement Learning From Human Feedback (2022)0.00
- Computerrl: Scaling End-to-end Online Reinforcement Learning For Computer Use Agents (2025)0.00
- Using Cognitive Models To Train Warm Start Reinforcement Learning Agents For Human-computer Interactions (2021)0.00
- Human AI Interaction Loop Training: New Approach For Interactive Reinforcement Learning (2020)0.00
- Efficient Multi-turn RL For GUI Agents Via Decoupled Training And Adaptive Data Curation (2025)0.00
- Human-inspired Framework To Accelerate Reinforcement Learning (2023)0.00
- Emergent Social Learning Via Multi-agent Reinforcement Learning (2020)0.00
- TGRL: An Algorithm For Teacher Guided Reinforcement Learning (2023)0.00