Reliable Conditioning Of Behavioral Cloning For Offline Reinforcement Learning
2022 Β· Tung Nguyen, Qinqing Zheng, Aditya Grover
Abstract
Behavioral cloning (BC) provides a straightforward solution to offline RL by mimicking offline trajectories via supervised learning. Recent advances (Chen et al., 2021; Janner et al., 2021; Emmons et al., 2021) have shown that by conditioning on desired future returns, BC can perform competitively to their value-based counterparts, while enjoying much more simplicity and training stability. While promising, we show that these methods can be unreliable, as their performance may degrade significantly when conditioned on high, out-of-distribution (ood) returns. This is crucial in practice, as we often expect the policy to perform better than the offline dataset by conditioning on an ood value. We show that this unreliability arises from both the suboptimality of training data and model architectures. We propose ConserWeightive Behavioral Cloning (CWBC), a simple and effective method for improving the reliability of conditional BC with two key components: trajectory weighting and conservat
Authors
(none)
Tags
Stats
Related papers
- When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning? (2022)0.00
- Improving TD3-BC: Relaxed Policy Constraint For Offline Learning And Stable Online Fine-tuning (2022)0.00
- Adaptive Behavior Cloning Regularization For Stable Offline-to-online Reinforcement Learning (2022)8.09
- Know Your Boundaries: The Necessity Of Explicit Behavioral Cloning In Offline RL (2022)0.00
- B3C: A Minimalist Approach To Offline Multi-agent Reinforcement Learning (2025)0.00
- Behavior Prior Representation Learning For Offline Reinforcement Learning (2022)0.00
- Robust Behavior Cloning Via Global Lipschitz Regularization (2025)0.00
- Confidence-conditioned Value Functions For Offline Reinforcement Learning (2022)0.00