Kalman Meets Bellman: Improving Policy Evaluation Through Value Tracking
2020 Β· Shirli di-Castro Shashua, Shie Mannor
Abstract
Policy evaluation is a key process in Reinforcement Learning (RL). It assesses a given policy by estimating the corresponding value function. When using parameterized value functions, common approaches minimize the sum of squared Bellman temporal-difference errors and receive a point-estimate for the parameters. Kalman-based and Gaussian-processes based frameworks were suggested to evaluate the policy by treating the value as a random variable. These frameworks can learn uncertainties over the value parameters and exploit them for policy exploration. When adopting these frameworks to solve deep RL tasks, several limitations are revealed: excessive computations in each optimization step, difficulty with handling batches of samples which slows training and the effect of memory in stochastic environments which prevents off-policy learning. In this work, we discuss these limitations and propose to overcome them by an alternative general framework, based on the extended Kalman filter. We de
Authors
(none)
Tags
Stats
Related papers
- Trust Region Value Optimization Using Kalman Filtering (2019)0.00
- General Policy Evaluation And Improvement By Learning To Identify Few But Crucial States (2022)0.00
- The Value-improvement Path: Towards Better Representations For Reinforcement Learning (2020)6.77
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- On The Convergence Of Policy Iteration-based Reinforcement Learning With Monte Carlo Policy Evaluation (2023)0.00
- UVIP: Model-free Approach To Evaluate Reinforcement Learning Algorithms (2021)0.00
- High-confidence Error Estimates For Learned Value Functions (2018)0.00
- A Generalized Projected Bellman Error For Off-policy Value Estimation In Reinforcement Learning (2021)0.00