Digi-q: Learning Q-value Functions For Training Device-control Agents
2025 Β· Hao Bai, Yifei Zhou, Li Erran Li, et al.
Abstract
While a number of existing approaches for building foundation model agents rely on prompting or fine-tuning with human demonstrations, it is not sufficient in dynamic environments (e.g., mobile device control). On-policy reinforcement learning (RL) should address these limitations, but collecting actual rollouts in an environment is often undesirable in truly open-ended agentic problems such as mobile device control or interacting with humans, where each unit of interaction is associated with a cost. In such scenarios, a method for policy learning that can utilize off-policy experience by learning a trained action-value function is much more effective. In this paper, we develop an approach, called Digi-Q, to train VLM-based action-value Q-functions which are then used to extract the agent policy. We study our approach in the mobile device control setting. Digi-Q trains the Q-function using offline temporal-difference (TD) learning, on top of frozen, intermediate-layer features of a VLM
Authors
(none)
Tags
Stats
Related papers
- Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-values (2024)0.00
- Mixed Q-functionals: Advancing Value-based Methods In Cooperative MARL With Continuous Action Domains (2024)0.00
- Approximating Gradients For Differentiable Quality Diversity In Reinforcement Learning (2022)0.00
- Approximating Two Value Functions Instead Of One: Towards Characterizing A New Family Of Deep Reinforcement Learning Algorithms (2019)0.00
- Mobile-r1: Towards Interactive Capability For Vlm-based Mobile Agent Via Systematic Training (2026)0.00
- Efficient Multi-turn RL For GUI Agents Via Decoupled Training And Adaptive Data Curation (2025)0.00
- NQMIX: Non-monotonic Value Function Factorization For Deep Multi-agent Reinforcement Learning (2021)0.00
- Residual Q-networks For Value Function Factorizing In Multi-agent Reinforcement Learning (2022)10.21