Policy Optimization Over General State And Action Spaces
2022 Β· Caleb Ju, Guanghui Lan
Abstract
Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging. In contrast to the tableau setting, one can not enumerate all the states and then iteratively update the policies for each state. This prevents the application of many well-studied RL methods especially those with provable convergence guarantees. In this paper, we first present a substantial generalization of the recently developed policy mirror descent method to deal with general state and action spaces. We introduce new approaches to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all. Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation techniques can be applied. We establish linear convergence rate to global optimality or sublinear convergence to stationarity for these methods applied to solve different classes of RL problems under exact policy evaluation.
Authors
(none)
Tags
Stats
Related papers
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Policy Optimization For Continuous Reinforcement Learning (2023)2.26
- A Novel Framework For Policy Mirror Descent With General Parameterization And Linear Convergence (2023)2.26
- Conservative Optimistic Policy Optimization Via Multiple Importance Sampling (2021)0.00
- Variational Policy Gradient Method For Reinforcement Learning With General Utilities (2020)0.00
- Taming "data-hungry" Reinforcement Learning? Stability In Continuous State-action Spaces (2024)2.26
- On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift (2019)0.00
- Mirror Learning: A Unifying Framework Of Policy Optimisation (2022)0.00