Discovering Multiagent Learning Algorithms With Large Language Models
2026 Β· Zun Li, John Schultz, Daniel Hennes, et al.
Abstract
Much of the advancement of Multi-Agent Reinforcement Learning (MARL) in imperfect-information games has historically depended on manual iterative refinement of baselines. While foundational families like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) rest on solid theoretical ground, the design of their most effective variants often relies on human intuition to navigate a vast algorithmic design space. In this work, we propose the use of AlphaEvolve, an evolutionary coding agent powered by large language models, to automatically discover new multiagent learning algorithms. We demonstrate the generality of this framework by evolving novel variants for two distinct paradigms of game-theoretic learning. First, in the domain of iterative regret minimization, we evolve the logic governing regret accumulation and policy derivation, discovering a new algorithm, Volatility-Adaptive Discounted (VAD-)CFR. VAD-CFR employs novel, non-intuitive mechanisms-includin
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Reinforcement Learning As A Computational Tool For Language Evolution Research: Historical Context And Future Challenges (2020)0.00
- Policyevolve: Evolving Programmatic Policies By Llms For Multi-player Games Via Population-based Training (2025)0.00
- Evolution Of Societies Via Reinforcement Learning (2024)0.00
- Re-conceptualising The Language Game Paradigm In The Framework Of Multi-agent Reinforcement Learning (2020)0.00
- On Improving Model-free Algorithms For Decentralized Multi-agent Reinforcement Learning (2021)0.00
- Incentivize Without Bonus: Provably Efficient Model-based Online Multi-agent RL For Markov Games (2025)0.00
- Game Theory And Multi-agent Reinforcement Learning : From Nash Equilibria To Evolutionary Dynamics (2024)0.00
- Agent-pro: Learning To Evolve Via Policy-level Reflection And Optimization (2024)9.59