Debiased Offline Representation Learning For Fast Online Adaptation In Non-stationary Dynamics
2024 Β· Xinyu Zhang, Wenjie Qiu, Yi-Chen Li, et al.
Abstract
Developing policies that can adjust to non-stationary environments is essential for real-world reinforcement learning applications. However, learning such adaptable policies in offline settings, with only a limited set of pre-collected trajectories, presents significant challenges. A key difficulty arises because the limited offline data makes it hard for the context encoder to differentiate between changes in the environment dynamics and shifts in the behavior policy, often leading to context misassociations. To address this issue, we introduce a novel approach called Debiased Offline Representation for fast online Adaptation (DORA). DORA incorporates an information bottleneck principle that maximizes mutual information between the dynamics encoding and the environmental data, while minimizing mutual information between the dynamics encoding and the actions of the behavior policy. We present a practical implementation of DORA, leveraging tractable bounds of the information bottleneck
Authors
(none)
Tags
Stats
Related papers
- Pandr: Fast Adaptation To New Environments From Offline Experiences Via Decoupling Policy And Environment Representations (2022)0.00
- DARA: Dynamics-aware Reward Augmentation In Offline Reinforcement Learning (2022)0.00
- Minimum-delay Adaptation In Non-stationary Reinforcement Learning Via Online High-confidence Change-point Detection (2021)0.00
- Towards Data-driven Offline Simulations For Online Reinforcement Learning (2022)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- Learning A Subspace Of Policies For Online Adaptation In Reinforcement Learning (2021)0.00
- Adaptive Replay Buffer For Offline-to-online Reinforcement Learning (2025)0.00
- Policy-driven World Model Adaptation For Robust Offline Model-based Reinforcement Learning (2025)0.00