Biased Aggregation, Rollout, And Enhanced Policy Improvement For Reinforcement Learning
2019 Β· Dimitri Bertsekas
Abstract
We propose a new aggregation framework for approximate dynamic programming, which provides a connection with rollout algorithms, approximate policy iteration, and other single and multistep lookahead methods. The central novel characteristic is the use of a bias function \(V\) of the state, which biases the values of the aggregate cost function towards their correct levels. The classical aggregation framework is obtained when \(V\equiv0\), but our scheme works best when \(V\) is a known reasonably good approximation to the optimal cost function \(J^*\). When \(V\) is equal to the cost function \(J_\{\mu\}\) of some known policy \(\mu\) and there is only one aggregate state, our scheme is equivalent to the rollout algorithm based on \(\mu\) (i.e., the result of a single policy improvement starting with the policy \(\mu\)). When \(V=J_\{\mu\}\) and there are multiple aggregate states, our aggregation approach can be used as a more powerful form of improvement of \(\mu\). Thus, when com
Authors
(none)
Tags
Stats
Related papers
- Feature-based Aggregation And Deep Reinforcement Learning: A Survey And Some New Implementations (2018)0.00
- Unifying Value Iteration, Advantage Learning, And Dynamic Policy Programming (2017)0.00
- The Role Of Lookahead And Approximate Policy Evaluation In Reinforcement Learning With Linear Value Function Approximation (2021)0.00
- Reinforcement Learning With Unbiased Policy Evaluation And Linear Function Approximation (2022)0.00
- Adaptive Approximate Policy Iteration (2020)0.00
- Balanced Aggregation: Understanding And Fixing Aggregation Bias In GRPO (2026)0.00
- Multiagent Rollout Algorithms And Reinforcement Learning (2019)0.00
- Parameter-free Reduction Of The Estimation Bias In Deep Reinforcement Learning For Deterministic Policy Gradients (2021)0.00