Generative Evolutionary Meta-solver (GEMS): Scalable Surrogate-free Multi-agent Reinforcement Learning
2025 Β· Alakh Sharma, Gaurish Trivedi, Kartikey Singh Bhandari, et al.
Abstract
Scalable multi-agent reinforcement learning (MARL) remains a central challenge for AI. Existing population-based methods, like Policy-Space Response Oracles, PSRO, require storing explicit policy populations and constructing full payoff matrices, incurring quadratic computation and linear memory costs. We present Generative Evolutionary Meta-Solver (GEMS), a surrogate-free framework that replaces explicit populations with a compact set of latent anchors and a single amortized generator. Instead of exhaustively constructing the payoff matrix, GEMS relies on unbiased Monte Carlo rollouts, multiplicative-weights meta-dynamics, and a model-free empirical-Bernstein UCB oracle to adaptively expand the policy set. Best responses are trained within the generator using an advantage-based trust-region objective, eliminating the need to store and train separate actors. We evaluated GEMS in a variety of Two-player and Multi-Player games such as the Deceptive Messages Game, Kuhn Poker and Multi-Par
Authors
(none)
Tags
Stats
Related papers
- Evolution Of Societies Via Reinforcement Learning (2024)0.00
- Policyevolve: Evolving Programmatic Policies By Llms For Multi-player Games Via Population-based Training (2025)0.00
- Incentivize Without Bonus: Provably Efficient Model-based Online Multi-agent RL For Markov Games (2025)0.00
- Evolutionary Population Curriculum For Scaling Multi-agent Reinforcement Learning (2020)0.00
- Discovering Multiagent Learning Algorithms With Large Language Models (2026)2.05
- Maximum Entropy Heterogeneous-agent Reinforcement Learning (2023)0.00
- Combining Tree-search, Generative Models, And Nash Bargaining Concepts In Game-theoretic Reinforcement Learning (2023)0.00
- A Generalized Training Approach For Multiagent Learning (2019)0.00