Trust Region Value Optimization Using Kalman Filtering
2019 Β· Shirli di-Castro Shashua, Shie Mannor
Abstract
Policy evaluation is a key process in reinforcement learning. It assesses a given policy using estimation of the corresponding value function. When using a parameterized function to approximate the value, it is common to optimize the set of parameters by minimizing the sum of squared Bellman Temporal Differences errors. However, this approach ignores certain distributional properties of both the errors and value parameters. Taking these distributions into account in the optimization process can provide useful information on the amount of confidence in value estimation. In this work we propose to optimize the value by minimizing a regularized objective function which forms a trust region over its parameters. We present a novel optimization method, the Kalman Optimization for Value Approximation (KOVA), based on the Extended Kalman Filter. KOVA minimizes the regularized objective function by adopting a Bayesian perspective over both the value parameters and noisy observed returns. This d
Authors
(none)
Tags
Stats
Related papers
- Kalman Meets Bellman: Improving Policy Evaluation Through Value Tracking (2020)0.00
- Value Enhancement Of Reinforcement Learning Via Efficient And Robust Trust Region Optimization (2023)0.00
- Uncertainty-aware Policy Optimization: A Robust, Adaptive Trust Region Approach (2020)0.00
- Average-reward Reinforcement Learning With Trust Region Methods (2021)0.00
- Model-based Epistemic Variance Of Values For Risk-aware Policy Optimization (2023)0.00
- An Analytical Update Rule For General Policy Optimization (2021)0.00
- Value-distributional Model-based Reinforcement Learning (2023)1.56
- Pretrain Value, Not Reward: Decoupled Value Policy Optimization (2025)0.00