General Policy Evaluation And Improvement By Learning To Identify Few But Crucial States
2022 Β· Francesco Faccio, Aditya Ramesh, Vincent Herrmann, et al.
Abstract
Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value function for many policies. Here we combine the actor-critic architecture of Parameter-Based Value Functions and the policy embedding of Policy Evaluation Networks to learn a single value function for evaluating (and thus helping to improve) any policy represented by a deep neural network (NN). The method yields competitive experimental results. In continuous control problems with infinitely many states, our value function minimizes its prediction error by simultaneously learning a small set of `probing states' and a mapping from actions produced in probing states to the policy's return. The method extracts crucial abstract knowledge about the environment in form of very few states sufficient to fully specify the behavior of many policies. A policy
Authors
(none)
Tags
Stats
Related papers
- Kalman Meets Bellman: Improving Policy Evaluation Through Value Tracking (2020)0.00
- The Value-improvement Path: Towards Better Representations For Reinforcement Learning (2020)6.77
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Improving Deep Reinforcement Learning By Reducing The Chain Effect Of Value And Policy Churn (2024)0.00
- Evaluation-aware Reinforcement Learning (2025)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- Learning Value Functions In Deep Policy Gradients Using Residual Variance (2020)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00