Goal-conditioned Offline Reinforcement Learning Through State Space Partitioning
2023 Β· Mianchu Wang, Yue Jin, Giovanni Montana
Abstract
Offline reinforcement learning (RL) aims to infer sequential decision policies using only offline datasets. This is a particularly difficult setup, especially when learning to achieve multiple different goals or outcomes under a given scenario with only sparse rewards. For offline learning of goal-conditioned policies via supervised learning, previous work has shown that an advantage weighted log-likelihood loss guarantees monotonic policy improvement. In this work we argue that, despite its benefits, this approach is still insufficient to fully address the distribution shift and multi-modality problems. The latter is particularly severe in long-horizon tasks where finding a unique and optimal policy that goes from a state to the desired goal is challenging as there may be multiple and potentially conflicting solutions. To tackle these challenges, we propose a complementary advantage-based weighting scheme that introduces an additional source of inductive bias: given a value-based part
Authors
(none)
Tags
Stats
Related papers
- State-constrained Offline Reinforcement Learning (2024)0.00
- Projected State-action Balancing Weights For Offline Reinforcement Learning (2021)0.00
- Goal-conditioned Reinforcement Learning From Sub-optimal Data On Metric Spaces (2024)0.00
- A2PO: Towards Effective Offline Reinforcement Learning From An Advantage-aware Perspective (2024)1.69
- Bridging Distributionally Robust Learning And Offline RL: An Approach To Mitigate Distribution Shift And Partial Data Coverage (2023)0.00
- A Policy-guided Imitation Approach For Offline Reinforcement Learning (2022)0.00
- Regularizing A Model-based Policy Stationary Distribution To Stabilize Offline Reinforcement Learning (2022)0.00
- Confidence-conditioned Value Functions For Offline Reinforcement Learning (2022)0.00