Efficient Offline Policy Optimization With A Learned Model
2022 Β· Zichen Liu, Siyi Li, Wee Sun Lee, et al.
Abstract
MuZero Unplugged presents a promising approach for offline policy learning from logged data. It conducts Monte-Carlo Tree Search (MCTS) with a learned model and leverages Reanalyze algorithm to learn purely from offline data. For good performance, MCTS requires accurate learned models and a large number of simulations, thus costing huge computing time. This paper investigates a few hypotheses where MuZero Unplugged may not work well under the offline RL settings, including 1) learning with limited data coverage; 2) learning from offline data of stochastic environments; 3) improperly parameterized models given the offline data; 4) with a low compute budget. We propose to use a regularized one-step look-ahead approach to tackle the above issues. Instead of planning with the expensive MCTS, we use the learned model to construct an advantage estimation based on a one-step rollout. Policy improvements are towards the direction that maximizes the estimated advantage with regularization of th
Authors
(none)
Tags
Stats
Related papers
- Online And Offline Reinforcement Learning By Planning With A Learned Model (2021)0.00
- Simudice: Offline Policy Optimization Through World Model Updates And DICE Estimation (2024)0.00
- An Offline Risk-aware Policy Selection Method For Bayesian Markov Decision Processes (2021)0.00
- Morel : Model-based Offline Reinforcement Learning (2020)0.00
- Revisiting Design Choices In Offline Model-based Reinforcement Learning (2021)6.34
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- Constrained Latent Action Policies For Model-based Offline Reinforcement Learning (2024)0.00
- Overcoming Model Bias For Robust Offline Deep Reinforcement Learning (2020)11.58