Parameter-based Value Functions
2020 · Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber
Abstract
Traditional off-policy actor-critic Reinforcement Learning (RL) algorithms learn value functions of a single target policy. However, when value functions are updated to track the learned policy, they forget potentially useful information about old policies. We introduce a class of value functions called Parameter-Based Value Functions (PBVFs) whose inputs include the policy parameters. They can generalize across different policies. PBVFs can evaluate the performance of any policy given a state, a state-action pair, or a distribution over the RL agent's initial states. First we show how PBVFs yield novel off-policy policy gradient theorems. Then we derive off-policy actor-critic algorithms based on PBVFs trained by Monte Carlo or Temporal Difference methods. We show how learned PBVFs can zero-shot learn new policies that outperform any policy seen during training. Finally our algorithms are evaluated on a selection of discrete and continuous control tasks using shallow policies and deep
Authors
(none)
Tags
Stats
Related papers
- Towards Hyperparameter-free Policy Selection For Offline Reinforcement Learning (2021)0.00
- Learning Value Functions In Deep Policy Gradients Using Residual Variance (2020)0.00
- What About Inputing Policy In Value Function: Policy Representation And Policy-extended Value Function Approximator (2020)2.26
- Deep Radial-basis Value Functions For Continuous Control (2020)0.00
- Kalman Meets Bellman: Improving Policy Evaluation Through Value Tracking (2020)0.00
- General Policy Evaluation And Improvement By Learning To Identify Few But Crucial States (2022)0.00
- Particle Value Functions (2017)0.00
- Doubly Robust Off-policy Actor-critic Algorithms For Reinforcement Learning (2019)0.00