Improving Deep Reinforcement Learning By Reducing The Chain Effect Of Value And Policy Churn
2024 Β· Hongyao Tang, Glen Berseth
Abstract
Deep neural networks provide Reinforcement Learning (RL) powerful function approximators to address large-scale decision-making problems. However, these approximators introduce challenges due to the non-stationary nature of RL training. One source of the challenges in RL is that output predictions can churn, leading to uncontrolled changes after each batch update for states not included in the batch. Although such a churn phenomenon exists in each step of network training, how churn occurs and impacts RL remains under-explored. In this work, we start by characterizing churn in a view of Generalized Policy Iteration with function approximation, and we discover a chain effect of churn that leads to a cycle where the churns in value estimation and policy improvement compound and bias the learning dynamics throughout the iteration. Further, we concretize the study and focus on the learning issues caused by the chain effect in different settings, including greedy action deviation in value-b
Authors
(none)
Tags
Stats
Related papers
- The Phenomenon Of Policy Churn (2022)0.00
- Dissecting Deep RL With High Update Ratios: Combatting Value Divergence (2024)0.00
- The Value-improvement Path: Towards Better Representations For Reinforcement Learning (2020)6.77
- The Ladder In Chaos: A Simple And Effective Improvement To General DRL Algorithms By Policy Path Trimming And Boosting (2023)0.00
- General Policy Evaluation And Improvement By Learning To Identify Few But Crucial States (2022)0.00
- Never Worse, Mostly Better: Stable Policy Improvement In Deep Reinforcement Learning (2019)0.00
- Understanding The Pathologies Of Approximate Policy Evaluation When Combined With Greedification In Reinforcement Learning (2020)0.00
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00