UDQL: Bridging The Gap Between MSE Loss And The Optimal Value Function In Offline Reinforcement Learning
2024 Β· Yu Zhang, Rui Yu, Zhipeng Yao, et al.
Abstract
The Mean Square Error (MSE) is commonly utilized to estimate the solution of the optimal value function in the vast majority of offline reinforcement learning (RL) models and has achieved outstanding performance. However, we find that its principle can lead to overestimation phenomenon for the value function. In this paper, we first theoretically analyze overestimation phenomenon led by MSE and provide the theoretical upper bound of the overestimated error. Furthermore, to address it, we propose a novel Bellman underestimated operator to counteract overestimation phenomenon and then prove its contraction characteristics. At last, we propose the offline RL algorithm based on underestimated operator and diffusion policy model. Extensive experimental results on D4RL tasks show that our method can outperform state-of-the-art offline RL algorithms, which demonstrates that our theoretical analysis and underestimation way are effective for offline RL tasks.
Authors
(none)
Tags
Stats
Related papers
- Model-based Offline Reinforcement Learning With Lower Expectile Q-learning (2024)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- Nearly Minimax Optimal Offline Reinforcement Learning With Linear Function Approximation: Single-agent MDP And Markov Game (2022)0.00
- Offline Reinforcement Learning Under Value And Density-ratio Realizability: The Power Of Gaps (2022)0.00
- A Perspective Of Q-value Estimation On Offline-to-online Reinforcement Learning (2023)7.81
- On Instrumental Variable Regression For Deep Offline Policy Evaluation (2021)0.00
- Near-optimal Offline Reinforcement Learning Via Double Variance Reduction (2021)0.00
- Mutual Information Regularized Offline Reinforcement Learning (2022)0.00