Anchor-changing Regularized Natural Policy Gradient For Multi-objective Reinforcement Learning
2022 Β· Ruida Zhou, Tao Liu, Dileep Kalathil, et al.
Abstract
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off. We propose an Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically incorporate ideas from well-performing first-order methods into the design of policy optimization algorithms for multi-objective MDP problems. Theoretically, the designed algorithms based on the ARNPG framework achieve \(\tilde\{O\}(1/T)\) global convergence with exact gradients. Empirically, the ARNPG-guided algorithms also demonstrate superior performance compared to some existing policy gradient-based approaches in both exact gradients and sample-based scenarios.
Authors
(none)
Tags
Stats
Related papers
- Symmetric (optimistic) Natural Policy Gradient For Multi-agent Learning With Parameter Convergence (2022)0.00
- Learning General Parameterized Policies For Infinite Horizon Average Reward Constrained Mdps Via Primal-dual Policy Gradient Algorithm (2024)0.00
- Policy Gradient For Robust Markov Decision Processes (2024)0.00
- Recurrent Natural Policy Gradient For Pomdps (2024)0.00
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Fast Global Convergence Of Natural Policy Gradient Methods With Entropy Regularization (2020)0.00
- Joint Optimization Of Multi-objective Reinforcement Learning With Policy Gradient Based Algorithm (2021)6.34
- MDPGT: Momentum-based Decentralized Policy Gradient Tracking (2021)0.00