Symmetric Q-learning: Reducing Skewness Of Bellman Error In Online Reinforcement Learning
2024 Β· Motoki Omura, Takayuki Osa, Yusuke Mukuta, et al.
Abstract
In deep reinforcement learning, estimating the value function to evaluate the quality of states and actions is essential. The value function is often trained using the least squares method, which implicitly assumes a Gaussian error distribution. However, a recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator, and violates the implicit assumption of normal error distribution in the least squares method. To address this, we proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution. We evaluated the proposed method on continuous control benchmark tasks in MuJoCo. It improved the sample efficiency of a state-of-the-art reinforcement learning method by reducing the skewness of the error distribution.
Authors
(none)
Tags
Stats
Related papers
- Robust Losses For Learning Value Functions (2022)0.00
- Smoothed Action Value Functions For Learning Gaussian Policies (2018)0.00
- Parameter-free Reduction Of The Estimation Bias In Deep Reinforcement Learning For Deterministic Policy Gradients (2021)0.00
- On The Estimation Bias In Double Q-learning (2021)0.00
- Stabilizing Extreme Q-learning By Maclaurin Expansion (2024)0.00
- Q-distribution Guided Q-learning For Offline Reinforcement Learning: Uncertainty Penalized Q-value Via Consistency Model (2024)0.00
- Estimation Error Correction In Deep Reinforcement Learning For Deterministic Actor-critic Methods (2021)7.16
- A Generalized Projected Bellman Error For Off-policy Value Estimation In Reinforcement Learning (2021)0.00