Prediction And Control In Continual Reinforcement Learning
2023 Β· Nishanth Anand, Doina Precup
Abstract
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge that persists over time, and a transient value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.
Authors
(none)
Tags
Stats
Related papers
- Discerning Temporal Difference Learning (2023)0.00
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- Preferential Temporal Difference Learning (2021)0.00
- Per-decision Multi-step Temporal Difference Learning With Control Variates (2018)0.00
- TD Or Not TD: Analyzing The Role Of Temporal Differencing In Deep Reinforcement Learning (2018)0.00
- Fixed-horizon Temporal Difference Methods For Stable Reinforcement Learning (2019)0.00
- Temporal Difference Models: Model-free Deep RL For Model-based Control (2018)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00