The Phenomenon Of Policy Churn
2022 Β· Tom Schaul, AndrΓ© Barreto, John Quan, et al.
Abstract
We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly rapid pace, changing the greedy action in a large fraction of states within a handful of learning updates (in a typical deep RL set-up such as DQN on Atari). We characterise the phenomenon empirically, verifying that it is not limited to specific algorithm or environment properties. A number of ablations help whittle down the plausible explanations on why churn occurs to just a handful, all related to deep learning. Finally, we hypothesise that policy churn is a beneficial but overlooked form of implicit exploration that casts \(\epsilon\)-greedy exploration in a fresh light, namely that \(\epsilon\)-noise plays a much smaller role than expected.
Authors
(none)
Tags
Stats
Related papers
- Improving Deep Reinforcement Learning By Reducing The Chain Effect Of Value And Policy Churn (2024)0.00
- Understanding The Pathologies Of Approximate Policy Evaluation When Combined With Greedification In Reinforcement Learning (2020)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Exploring More When It Needs In Deep Reinforcement Learning (2021)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Bad Habits: Policy Confounding And Out-of-trajectory Generalization In RL (2023)0.00
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00
- Addressing Action Oscillations Through Learning Policy Inertia (2021)7.81