Foresee Then Evaluate: Decomposing Value Estimation With Latent Future Prediction
2021 Β· Hongyao Tang, Jianye Hao, Guangyong Chen, et al.
Abstract
Value function is the central notion of Reinforcement Learning (RL). Value estimation, especially with function approximation, can be challenging since it involves the stochasticity of environmental dynamics and reward signals that can be sparse and delayed in some cases. A typical model-free RL algorithm usually estimates the values of a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without explicitly taking dynamics into consideration. In this paper, we propose Value Decomposition with Future Prediction (VDFP), providing an explicit two-step understanding of the value estimation process: 1) first foresee the latent future, 2) and then evaluate it. We analytically decompose the value function into a latent future dynamics part and a policy-independent trajectory return part, inducing a way to model latent dynamics and returns separately in value estimation. Further, we derive a practical deep RL algorithm, consisting of a convolutional model
Authors
(none)
Tags
Stats
Related papers
- Disentangling Dynamics And Returns: Value Function Decomposition With Future Prediction (2019)0.00
- The Value-improvement Path: Towards Better Representations For Reinforcement Learning (2020)6.77
- Pretrain Value, Not Reward: Decoupled Value Policy Optimization (2025)0.00
- Explaining An Agent's Future Beliefs Through Temporally Decomposing Future Reward Estimators (2024)0.00
- Tensor And Matrix Low-rank Value-function Approximation In Reinforcement Learning (2022)7.81
- Explainable Reinforcement Learning Via Temporal Policy Decomposition (2025)0.00
- Value Function Decomposition For Iterative Design Of Reinforcement Learning Agents (2022)0.00
- Finding Useful Predictions By Meta-gradient Descent To Improve Decision-making (2021)0.00