Bayesian Reparameterization Of Reward-conditioned Reinforcement Learning With Energy-based Models
2023 Β· Wenhao Ding, Tong Che, Ding Zhao, et al.
Abstract
Recently, reward-conditioned reinforcement learning (RCRL) has gained popularity due to its simplicity, flexibility, and off-policy nature. However, we will show that current RCRL approaches are fundamentally limited and fail to address two critical challenges of RCRL -- improving generalization on high reward-to-go (RTG) inputs, and avoiding out-of-distribution (OOD) RTG queries during testing time. To address these challenges when training vanilla RCRL architectures, we propose Bayesian Reparameterized RCRL (BR-RCRL), a novel set of inductive biases for RCRL inspired by Bayes' theorem. BR-RCRL removes a core obstacle preventing vanilla RCRL from generalizing on high RTG inputs -- a tendency that the model treats different RTG inputs as independent values, which we term ``RTG Independence". BR-RCRL also allows us to design an accompanying adaptive inference method, which maximizes total returns while avoiding OOD queries that yield unpredictable behaviors in vanilla RCRL methods. We s
Authors
(none)
Tags
Stats
Related papers
- Online Bayesian Risk-averse Reinforcement Learning (2025)0.00
- Generalized Bayesian Deep Reinforcement Learning (2024)0.00
- Bias Resilient Multi-step Off-policy Goal-conditioned Reinforcement Learning (2023)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- REBEL: Reward Regularization-based Approach For Robotic Reinforcement Learning From Human Feedback (2023)0.00
- Inferential Induction: A Novel Framework For Bayesian Reinforcement Learning (2020)0.00
- Symbol Guided Hindsight Priors For Reward Learning From Human Preferences (2022)0.00
- Bayesian Exploration Networks (2023)0.00