Scaling Laws For A Multi-agent Reinforcement Learning Model
2022 Β· Oren Neumann, Claudius Gros
Abstract
The recent observation of neural power-law scaling relations has made a significant impact in the field of deep learning. A substantial amount of attention has been dedicated as a consequence to the description of scaling laws, although mostly for supervised learning and only to a reduced extent for reinforcement learning frameworks. In this paper we present an extensive study of performance scaling for a cornerstone reinforcement learning algorithm, AlphaZero. On the basis of a relationship between Elo rating, playing strength and power-law scaling, we train AlphaZero agents on the games Connect Four and Pentago and analyze their performance. We find that player strength scales as a power law in neural network parameter count when not bottlenecked by available compute, and as a power of compute when training optimally sized agents. We observe nearly identical scaling exponents for both games. Combining the two observed scaling laws we obtain a power law relating optimal size to comput
Authors
(none)
Tags
Stats
Related papers
- Scaling Behaviors Of LLM Reinforcement Learning Post-training: An Empirical Study In Mathematical Reasoning (2025)0.00
- Impartial Games: A Challenge For Reinforcement Learning (2022)0.00
- The Art Of Scaling Reinforcement Learning Compute For Llms (2025)1.57
- ANS: Adaptive Network Scaling For Deep Rectifier Reinforcement Learning Models (2018)0.00
- Analysis Of Hyper-parameters For Small Games: Iterations Or Epochs In Self-play? (2020)0.00
- Alphazero-edu: Democratizing Access To Alphazero (2025)0.00
- Policy-value Alignment And Robustness In Search-based Multi-agent Learning (2023)0.00
- Large Scale Learning Of Agent Rationality In Two-player Zero-sum Games (2019)3.58