Compositional Conservatism: A Transductive Approach In Offline Reinforcement Learning
2024 Β· Yeda Song, Dongwook Lee, Gunhee Kim
Abstract
Offline reinforcement learning (RL) is a compelling framework for learning optimal policies from past experiences without additional interaction with the environment. Nevertheless, offline RL inevitably faces the problem of distributional shifts, where the states and actions encountered during policy execution may not be in the training dataset distribution. A common solution involves incorporating conservatism into the policy or the value function to safeguard against uncertainties and unknowns. In this work, we focus on achieving the same objectives of conservatism but from a different perspective. We propose COmpositional COnservatism with Anchor-seeking (COCOA) for offline RL, an approach that pursues conservatism in a compositional manner on top of the transductive reparameterization (Netanyahu et al., 2023), which decomposes the input variable (the state in our case) into an anchor and its difference from the original input. Our COCOA seeks both in-distribution anchors and differ
Authors
(none)
Tags
Stats
Related papers
- Long-horizon Model-based Offline Reinforcement Learning Without Conservatism (2025)0.00
- Plan Better Amid Conservatism: Offline Multi-agent Reinforcement Learning With Actor Rectification (2021)0.00
- DOMAIN: Mildly Conservative Model-based Offline Reinforcement Learning (2023)0.00
- Confidence-conditioned Value Functions For Offline Reinforcement Learning (2022)0.00
- Mildly Conservative Q-learning For Offline Reinforcement Learning (2022)0.00
- Online Reinforcement Learning In Non-stationary Context-driven Environments (2023)0.00
- Counterfactual Conservative Q Learning For Offline Multi-agent Reinforcement Learning (2023)0.00
- Optimal Conservative Offline RL With General Function Approximation Via Augmented Lagrangian (2022)0.00