Viva: Video-trained Value Functions For Guiding Online RL From Diverse Data
2025 Β· Nitish Dashora, Dibya Ghosh, Sergey Levine
Abstract
Online reinforcement learning (RL) with sparse rewards poses a challenge partly because of the lack of feedback on states leading to the goal. Furthermore, expert offline data with reward signal is rarely available to provide this feedback and bootstrap online learning. How can we guide online agents to the right solution without this on-task data? Reward shaping offers a solution by providing fine-grained signal to nudge the policy towards the optimal solution. However, reward shaping often requires domain knowledge to hand-engineer heuristics for a specific goal. To enable more general and inexpensive guidance, we propose and analyze a data-driven methodology that automatically guides RL by learning from widely available video data such as Internet recordings, off-task demonstrations, task failures, and undirected environment interaction. By learning a model of optimal goal-conditioned value from diverse passive data, we open the floor to scaling up and using various data sources to
Authors
(none)
Tags
Stats
Related papers
- Goal-driven Reward By Video Diffusion Models For Reinforcement Learning (2025)0.00
- Optimistic Curiosity Exploration And Conservative Exploitation With Linear Reward Shaping (2022)0.00
- Reinforcement Learning With Sparse Rewards Using Guidance From Offline Demonstration (2022)0.00
- Value-consistent Representation Learning For Data-efficient Reinforcement Learning (2022)0.00
- ORSO: Accelerating Reward Design Via Online Reward Selection And Policy Optimization (2024)0.00
- Learning To Identify Critical States For Reinforcement Learning From Videos (2023)8.76
- \(V_{0.5}\): Generalist Value Model As A Prior For Sparse RL Rollouts (2026)0.00
- Data Valuation For Offline Reinforcement Learning (2022)0.00