Efficient Generation Of Diverse Cooperative Agents With World Models
2025 Β· Yi Loo, Akshunn Trivedi, Malika Meghjani
Abstract
A major bottleneck in the training process for Zero-Shot Coordination (ZSC) agents is the generation of partner agents that are diverse in collaborative conventions. Current Cross-play Minimization (XPM) methods for population generation can be very computationally expensive and sample inefficient as the training objective requires sampling multiple types of trajectories. Each partner agent in the population is also trained from scratch, despite all of the partners in the population learning policies of the same coordination task. In this work, we propose that simulated trajectories from the dynamics model of an environment can drastically speed up the training process for XPM methods. We introduce XPM-WM, a framework for generating simulated trajectories for XPM via a learned World Model (WM). We show XPM with simulated trajectories removes the need to sample multiple trajectories. In addition, we show our proposed method can effectively generate partners with diverse conventions that
Authors
(none)
Tags
Stats
Related papers
- Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination (2025)0.00
- Heterogeneous Multi-agent Zero-shot Coordination By Coevolution (2022)5.24
- Maximum Entropy Population-based Training For Zero-shot Human-ai Coordination (2021)0.00
- Tackling Cooperative Incompatibility For Zero-shot Human-ai Coordination (2023)0.00
- Knowpc: Knowledge-driven Programmatic Reinforcement Learning For Zero-shot Coordination (2024)0.00
- PECAN: Leveraging Policy Ensemble For Context-aware Zero-shot Human-ai Coordination (2023)2.26
- "other-play" For Zero-shot Coordination (2020)0.00
- Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In The Game Of Hanabi (2023)0.00