ACPO: A Policy Optimization Algorithm For Average Mdps With Constraints
2023 Β· Akhil Agnihotri, Rahul Jain, Haipeng Luo
Abstract
Reinforcement Learning (RL) for constrained MDPs (CMDPs) is an increasingly important problem for various applications. Often, the average criterion is more suitable than the discounted criterion. Yet, RL for average-CMDPs (ACMDPs) remains a challenging problem. Algorithms designed for discounted constrained RL problems often do not perform well for the average CMDP setting. In this paper, we introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion. The Average-Constrained Policy Optimization (ACPO) algorithm is inspired by trust region-based policy optimization algorithms. We develop basic sensitivity theory for average CMDPs, and then use the corresponding bounds in the design of the algorithm. We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging OpenAI Gym environments, show its superior empirical performance when compared to other state-of-the-art algorit
Authors
(none)
Tags
Stats
Related papers
- Multi-objective Reward And Preference Optimization: Theory And Algorithms (2025)0.00
- Average-reward Reinforcement Learning With Trust Region Methods (2021)0.00
- Anytime-competitive Reinforcement Learning With Policy Prior (2023)0.00
- Learning General Parameterized Policies For Infinite Horizon Average Reward Constrained Mdps Via Primal-dual Policy Gradient Algorithm (2024)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- Performance Bounds For Policy-based Average Reward Reinforcement Learning Algorithms (2023)2.26
- Reward Constrained Policy Optimization (2018)0.00
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00