Recall Traces: Backtracking Models For Efficient Reinforcement Learning
2018 Β· Anirudh Goyal, Philemon Brakel, William Fedus, et al.
Abstract
In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samp
Authors
(none)
Tags
Stats
Related papers
- Trajectory-aware Eligibility Traces For Off-policy Reinforcement Learning (2023)0.00
- Improving The Efficiency Of Off-policy Reinforcement Learning By Accounting For Past Decisions (2021)0.00
- Learning State Representations Via Retracing In Reinforcement Learning (2021)0.00
- Tbq(\(\sigma\)): Improving Efficiency Of Trace Utilization For Off-policy Reinforcement Learning (2019)0.00
- Expected Eligibility Traces (2020)0.00
- Partially Observable Reinforcement Learning With Memory Traces (2025)0.00
- Reinforcement Learning With Trajectory Feedback (2020)0.00
- Predecessor Features (2022)0.00