Pandr: Fast Adaptation To New Environments From Offline Experiences Via Decoupling Policy And Environment Representations
2022 Β· Tong Sang, Hongyao Tang, Yi Ma, et al.
Abstract
Deep Reinforcement Learning (DRL) has been a promising solution to many complex decision-making problems. Nevertheless, the notorious weakness in generalization among environments prevent widespread application of DRL agents in real-world scenarios. Although advances have been made recently, most prior works assume sufficient online interaction on training environments, which can be costly in practical cases. To this end, we focus on an offline-training-online-adaptation setting, in which the agent first learns from offline experiences collected in environments with different dynamics and then performs online policy adaptation in environments with new dynamics. In this paper, we propose Policy Adaptation with Decoupled Representations (PAnDR) for fast policy adaptation. In offline training phase, the environment representation and policy representation are learned through contrastive learning and policy recovery, respectively. The representations are further refined by mutual informati
Authors
(none)
Tags
Stats
Related papers
- Debiased Offline Representation Learning For Fast Online Adaptation In Non-stationary Dynamics (2024)0.00
- Learning A Subspace Of Policies For Online Adaptation In Reinforcement Learning (2021)0.00
- Policy Agnostic RL: Offline RL And Online RL Fine-tuning Of Any Class And Backbone (2024)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- Federated Offline Policy Optimization With Dual Regularization (2024)3.58
- Adarl: What, Where, And How To Adapt In Transfer Reinforcement Learning (2021)0.00
- Towards Robust Policy: Enhancing Offline Reinforcement Learning With Adversarial Attacks And Defenses (2024)3.58
- Policy-driven World Model Adaptation For Robust Offline Model-based Reinforcement Learning (2025)0.00