Learning A Subspace Of Policies For Online Adaptation In Reinforcement Learning
2021 Β· Jean-Baptiste Gaya, Laure Soulier, Ludovic Denoyer
Abstract
Deep Reinforcement Learning (RL) is mainly studied in a setting where the training and the testing environments are similar. But in many practical applications, these environments may differ. For instance, in control systems, the robot(s) on which a policy is learned might differ from the robot(s) on which a policy will run. It can be caused by different internal factors (e.g., calibration issues, system attrition, defective modules) or also by external changes (e.g., weather conditions). There is a need to develop RL methods that generalize well to variations of the training conditions. In this article, we consider the simplest yet hard to tackle generalization setting where the test environment is unknown at train time, forcing the agent to adapt to the system's new dynamics. This online adaptation process can be computationally expensive (e.g., fine-tuning) and cannot rely on meta-RL techniques since there is just a single train environment. To do so, we propose an approach where we
Authors
(none)
Tags
Stats
Related papers
- Policy Agnostic RL: Offline RL And Online RL Fine-tuning Of Any Class And Backbone (2024)0.00
- Pandr: Fast Adaptation To New Environments From Offline Experiences Via Decoupling Policy And Environment Representations (2022)0.00
- Adversarial Policies: Attacking Deep Reinforcement Learning (2019)0.00
- Adarl: What, Where, And How To Adapt In Transfer Reinforcement Learning (2021)0.00
- Online Reinforcement Learning In Non-stationary Context-driven Environments (2023)0.00
- Policy Learning For Off-dynamics RL With Deficient Support (2024)0.00
- Online Robust Policy Learning In The Presence Of Unknown Adversaries (2018)0.00
- Offline Meta-reinforcement Learning With Online Self-supervision (2021)0.00