Policy Learning For Off-dynamics RL With Deficient Support
2024 Β· Linh Le Pham van, Hung The Tran, Sunil Gupta
Abstract
Reinforcement Learning (RL) can effectively learn complex policies. However, learning these policies often demands extensive trial-and-error interactions with the environment. In many real-world scenarios, this approach is not practical due to the high costs of data collection and safety concerns. As a result, a common strategy is to transfer a policy trained in a low-cost, rapid source simulator to a real-world target environment. However, this process poses challenges. Simulators, no matter how advanced, cannot perfectly replicate the intricacies of the real world, leading to dynamics discrepancies between the source and target environments. Past research posited that the source domain must encompass all possible target transitions, a condition we term full support. However, expecting full support is often unrealistic, especially in scenarios where significant dynamics discrepancies arise. In this paper, our emphasis shifts to addressing large dynamics mismatch adaptation. We move aw
Authors
(none)
Tags
Stats
Related papers
- A Conservative Approach For Few-shot Transfer In Off-dynamics Reinforcement Learning (2023)0.00
- Learning A Subspace Of Policies For Online Adaptation In Reinforcement Learning (2021)0.00
- Live In The Moment: Learning Dynamics Model Adapted To Evolving Policy (2022)0.00
- Robust Adversarial Policy Optimization Under Dynamics Uncertainty (2026)0.00
- Concurrent Learning Of Policy And Unknown Safety Constraints In Reinforcement Learning (2024)0.00
- Post-convergence Sim-to-real Policy Transfer: A Principled Alternative To Cherry-picking (2025)0.00
- When To Trust Your Simulator: Dynamics-aware Hybrid Offline-and-online Reinforcement Learning (2022)2.26
- Overcoming The Sim-to-real Gap: Leveraging Simulation To Learn To Explore For Real-world RL (2024)5.84