Addressing Action Oscillations Through Learning Policy Inertia
2021 Β· Chen Chen, Hongyao Tang, Jianye Hao, et al.
Abstract
Deep reinforcement learning (DRL) algorithms have been demonstrated to be effective in a wide range of challenging decision making and control tasks. However, these methods typically suffer from severe action oscillations in particular in discrete action setting, which means that agents select different actions within consecutive steps even though states only slightly differ. This issue is often neglected since the policy is usually evaluated by its cumulative rewards only. Action oscillation strongly affects the user experience and can even cause serious potential security menace especially in real-world domains with the main concern of safety, such as autonomous driving. To this end, we introduce Policy Inertia Controller (PIC) which serves as a generic plug-in framework to off-the-shelf DRL algorithms, to enables adaptive trade-off between the optimality and smoothness of the learned policy in a formal way. We propose Nested Policy Iteration as a general training algorithm for PIC-a
Authors
(none)
Tags
Stats
Related papers
- Enhancing Control Policy Smoothness By Aligning Actions With Predictions From Preceding States (2026)0.00
- The Ladder In Chaos: A Simple And Effective Improvement To General DRL Algorithms By Policy Path Trimming And Boosting (2023)0.00
- Dual Policy Iteration (2018)0.00
- Live In The Moment: Learning Dynamics Model Adapted To Evolving Policy (2022)0.00
- DDPG++: Striving For Simplicity In Continuous-control Off-policy Reinforcement Learning (2020)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Unified Policy Optimization For Continuous-action Reinforcement Learning In Non-stationary Tasks And Games (2022)2.26
- Specialized Deep Residual Policy Safe Reinforcement Learning-based Controller For Complex And Continuous State-action Spaces (2023)4.52