A Unifying Framework For Action-conditional Self-predictive Reinforcement Learning
2024 Β· Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, et al.
Abstract
Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model for self-predictive representation learning under the simplifying assumption that the algorithm depends on a fixed policy (BYOL-\(\Pi\)); this assumption is at odds with practical instantiations of such algorithms, which explicitly condition their predictions on future actions. In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework, characterizing its convergence properties and highlighting important distinctions between the limiting solutions of the BYOL-\(\Pi\) and BYOL-AC dynamics. We show how the two repre
Authors
(none)
Tags
Stats
Related papers
- Understanding Self-predictive Learning For Reinforcement Learning (2022)0.00
- Bridging State And History Representations: Understanding Self-predictive RL (2024)0.00
- Simplifying Model-based RL: Learning Representations, Latent-space Models, And Policies With One Objective (2022)0.00
- Embedded Universal Predictive Intelligence: A Coherent Framework For Multi-agent Learning (2025)0.00
- Data-efficient Reinforcement Learning With Self-predictive Representations (2020)0.00
- Active Inference And Reinforcement Learning: A Unified Inference On Continuous State And Action Spaces Under Partial Observability (2022)5.84
- Algorithmic Framework For Model-based Deep Reinforcement Learning With Theoretical Guarantees (2018)0.00
- Model Predictive Control And Reinforcement Learning: A Unified Framework Based On Dynamic Programming (2024)10.61