Prioritized Level Replay
2020 · Minqi Jiang, Edward Grefenstette, Tim Rocktäschel
Abstract
Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned from a level depends on the current policy, yet prior work defaults to uniform sampling of training levels independently of the policy. We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future. We show TD-errors effectively estimate a level's future learning potential and, when used to guide the sampling procedure, induce an emergent curriculum of increasingly difficult levels. By adapting the sampling of training levels, PLR significantly improv
Authors
(none)
Tags
Stats
Related papers
- Replay-guided Adversarial Environment Design (2021)0.00
- Prioritized Generative Replay (2024)0.00
- Large Batch Experience Replay (2021)0.00
- Illuminating Generalization In Deep Reinforcement Learning Through Procedural Level Generation (2018)0.00
- Prioritized Trajectory Replay: A Replay Memory For Data-driven Reinforcement Learning (2023)0.00
- Regret Minimization Experience Replay In Off-policy Reinforcement Learning (2021)0.00
- Prompt Replay: Speeding Up Grpo With On-policy Reuse Of High-signal Prompts (2026)0.00
- Replay For Safety (2021)0.00