← all papers · overview

\(π^*_0.6\): A VLA That Learns From Experience

·2025

Abstract

We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call π{}{0.6}\pi^\{*\}_\{0.6\}, that can then be specialized to attain high performance on downstream tasks through on-robot data collection. We show that the π{}{0.6}\pi^\{*\}_\{0.6\} model trained with the full RECAP method can fold laundry in real homes, reliably assemble boxes, and make espresso drinks using a professional espresso machine. On some of the hardest tasks, RECAP more than doubles task throughput

Related papers

Ranked by semantic similarity — how closely each paper's abstract matches this one (100% = near-identical topic).