Provably Efficient Generalized Lagrangian Policy Optimization For Safe Multi-agent Reinforcement Learning
2023 Β· Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, et al.
Abstract
We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities. Our focus is confined to an episodic two-player zero-sum constrained Markov game with independent transition functions that are unknown to agents, adversarial reward functions, and stochastic utility functions. For such a Markov game, we employ an approach based on the occupancy measure to formulate it as an online constrained saddle-point problem with an explicit constraint. We extend the Lagrange multiplier method in constrained optimization to handle the constraint by creating a generalized Lagrangian with minimax decision primal variables and a dual variable. Next, we develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem while balancing exploration and exploitation. Our algorithm updates the minimax decision primal variables via online mirror
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Constrained Policy Optimisation (2021)0.00
- The Lagrangian Method For Solving Constrained Markov Games (2025)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Incentivize Without Bonus: Provably Efficient Model-based Online Multi-agent RL For Markov Games (2025)0.00
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- Minimax-optimal Multi-agent RL In Markov Games With A Generative Model (2022)2.26
- Co2po: Coordinated Constrained Policy Optimization For Multi-agent RL (2026)0.00
- Safe Multi-agent Reinforcement Learning With Convergence To Generalized Nash Equilibrium (2024)0.00