Robust And Diverse Multi-agent Learning Via Rational Policy Gradient
2025 Β· Niklas Lauffer, Ameesh Shah, Micah Carroll, et al.
Abstract
Adversarial optimization algorithms that explicitly search for flaws in agents' policies have been successfully applied to finding robust and diverse policies in multi-agent settings. However, the success of adversarial optimization has been largely limited to zero-sum settings because its naive application in cooperative settings leads to a critical failure mode: agents are irrationally incentivized to self-sabotage, blocking the completion of tasks and halting further learning. To address this, we introduce Rationality-preserving Policy Optimization (RPO), a formalism for adversarial optimization that avoids self-sabotage by ensuring agents remain rational--that is, their policies are optimal with respect to some possible partner policy. To solve RPO, we develop Rational Policy Gradient (RPG), which trains agents to maximize their own reward in a modified version of the original game in which we use opponent shaping techniques to optimize the adversarial objective. RPG enables us to
Authors
(none)
Tags
Stats
Related papers
- Optimistic Multi-agent Policy Gradient (2023)0.00
- Discovering Diverse Multi-agent Strategic Behavior Via Reward Randomization (2021)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Adversarial Style Transfer For Robust Policy Optimization In Deep Reinforcement Learning (2023)0.00
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81
- Multi-agent Cooperation Through Learning-aware Policy Gradients (2024)0.00
- Halypo: Heterogeneous-agent Lyapunov Policy Optimization For Human-robot Collaboration (2026)0.00
- Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning (2023)0.00