Refining Minimax Regret For Unsupervised Environment Design
2024 Β· Michael Beukman, Samuel Coward, Michael Matthews, et al.
Abstract
In unsupervised environment design, reinforcement learning agents are trained on environment configurations (levels) generated by an adversary that maximises some objective. Regret is a commonly used objective that theoretically results in a minimax regret (MMR) policy with desirable robustness guarantees; in particular, the agent's maximum regret is bounded. However, once the agent reaches this regret bound on all levels, the adversary will only sample levels where regret cannot be further reduced. Although there are possible performance improvements to be made outside of these regret-maximising levels, learning stagnates. In this work, we introduce Bayesian level-perfect MMR (BLP), a refinement of the minimax regret objective that overcomes this limitation. We formally show that solving for this objective results in a subset of MMR policies, and that BLP policies act consistently with a Perfect Bayesian policy over all levels. We further introduce an algorithm, ReMiDi, that results i
Authors
(none)
Tags
Stats
Related papers
- Information-theoretic Minimax Regret Bounds For Reinforcement Learning Based On Duality (2024)3.58
- Discovering General Reinforcement Learning Algorithms With Adversarial Environment Design (2023)0.00
- Learning To Design Games: Strategic Environments In Reinforcement Learning (2017)0.00
- Replay-guided Adversarial Environment Design (2021)0.00
- MAESTRO: Open-ended Environment Design For Multi-agent Reinforcement Learning (2023)0.00
- Accommodating Picky Customers: Regret Bound And Exploration Complexity For Multi-objective Reinforcement Learning (2020)0.00
- An Agent Design With Goal Reaching Guarantees For Enhancement Of Learning (2024)0.00
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00