Stabilizing Extreme Q-learning By Maclaurin Expansion
2024 Β· Motoki Omura, Takayuki Osa, Yusuke Mukuta, et al.
Abstract
In offline reinforcement learning, in-sample learning methods have been widely used to prevent performance degradation caused by evaluating out-of-distribution actions from the dataset. Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution, enabling it to model the soft optimal value function in an in-sample manner. It has demonstrated strong performance in both offline and online reinforcement learning settings. However, issues remain, such as the instability caused by the exponential term in the loss function and the risk of the error distribution deviating from the Gumbel distribution. Therefore, we propose Maclaurin Expanded Extreme Q-learning to enhance stability. In this method, applying Maclaurin expansion to the loss function in XQL enhances stability against large errors. This approach involves adjusting the modeled value function between the value function under the behavior policy and the soft optimal value
Authors
(none)
Tags
Stats
Related papers
- Quantile Q-learning: Revisiting Offline Extreme Q-learning With Quantile Regression (2025)0.00
- Q-distribution Guided Q-learning For Offline Reinforcement Learning: Uncertainty Penalized Q-value Via Consistency Model (2024)0.00
- Mildly Conservative Q-learning For Offline Reinforcement Learning (2022)0.00
- Emaq: Expected-max Q-learning Operator For Simple Yet Effective Offline And Online RL (2020)0.00
- Symmetric Q-learning: Reducing Skewness Of Bellman Error In Online Reinforcement Learning (2024)0.00
- ACL-QL: Adaptive Conservative Level In Q-learning For Offline Reinforcement Learning (2024)0.00
- Projected Off-policy Q-learning (POP-QL) For Stabilizing Offline Reinforcement Learning (2023)0.00
- UDQL: Bridging The Gap Between MSE Loss And The Optimal Value Function In Offline Reinforcement Learning (2024)0.00