Meta-value Learning: A General Framework For Learning With Learning Awareness
2023 Β· Tim Cooijmans, Milad Aghajohari, Aaron Courville
Abstract
Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by differentiating through one step of optimization. We propose to judge joint policies by their long-term prospects as measured by the meta-value, a discounted sum over the returns of future optimization iterates. We apply a form of Q-learning to the meta-game of optimization, in a way that avoids the need to explicitly represent the continuous action space of policy updates. The resulting method, MeVa, is consistent and far-sighted, and does not require REINFORCE estimators. We analyze the behavior of our method on a toy game and compare to prior work on repeated matrix games.
Authors
(none)
Tags
Stats
Related papers
- A Policy Gradient Algorithm For Learning To Learn In Multiagent Reinforcement Learning (2020)0.00
- Learning Meta Representations For Agents In Multi-agent Reinforcement Learning (2021)0.00
- Finding Useful Predictions By Meta-gradient Descent To Improve Decision-making (2021)0.00
- Multi-agent Cooperation Through Learning-aware Policy Gradients (2024)0.00
- Unifying Gradient Estimators For Meta-reinforcement Learning Via Off-policy Evaluation (2021)0.00
- Credit Assignment With Meta-policy Gradient For Multi-agent Reinforcement Learning (2021)0.00
- Meta-gradient Reinforcement Learning With An Objective Discovered Online (2020)0.00
- Qatten: A General Framework For Cooperative Multiagent Reinforcement Learning (2020)0.00