Learning Generalizable Risk-sensitive Policies To Coordinate In Decentralized Multi-agent General-sum Games
2022 Β· Ziyi Liu, Xian Guo, Yongchun Fang
Abstract
While various multi-agent reinforcement learning methods have been proposed in cooperative settings, few works investigate how self-interested learning agents achieve mutual coordination in decentralized general-sum games and generalize pre-trained policies to non-cooperative opponents during execution. In this paper, we present Generalizable Risk-Sensitive Policy (GRSP). GRSP learns the distributions over agent's return and estimate a dynamic risk-seeking bonus to discover risky coordination strategies. Furthermore, to avoid overfitting to training opponents, GRSP learns an auxiliary opponent modeling task to infer opponents' types and dynamically alter corresponding strategies during execution. Empirically, agents trained via GRSP can achieve mutual coordination during training stably and avoid being exploited by non-cooperative opponents during execution. To the best of our knowledge, it is the first method to learn coordination strategies between agents both in iterated prisoner's
Authors
(none)
Tags
Stats
Related papers
- Training Generalizable Collaborative Agents Via Strategic Risk Aversion (2026)0.00
- GCS: Graph-based Coordination Strategy For Multi-agent Reinforcement Learning (2022)0.00
- Multi-agent Cooperation Through Learning-aware Policy Gradients (2024)0.00
- Promoting Coordination Through Policy Regularization In Multi-agent Deep Reinforcement Learning (2019)0.00
- Cooperative Multi-agent Policy Gradients With Sub-optimal Demonstration (2018)0.00
- A Structured Prediction Approach For Generalization In Cooperative Multi-agent Reinforcement Learning (2019)0.00
- Parameter Sharing Deep Deterministic Policy Gradient For Cooperative Multi-agent Reinforcement Learning (2017)0.00
- Provably Efficient Reinforcement Learning In Decentralized General-sum Markov Games (2021)0.00