Post-convergence Sim-to-real Policy Transfer: A Principled Alternative To Cherry-picking
2025 Β· Dylan Khor, Bowen Weng
Abstract
Learning-based approaches, particularly reinforcement learning (RL), have become widely used for developing control policies for autonomous agents, such as locomotion policies for legged robots. RL training typically maximizes a predefined reward (or minimizes a corresponding cost/loss) by iteratively optimizing policies within a simulator. Starting from a randomly initialized policy, the empirical expected reward follows a trajectory with an overall increasing trend. While some policies become temporarily stuck in local optima, a well-defined training process generally converges to a reward level with noisy oscillations. However, selecting a policy for real-world deployment is rarely an analytical decision (i.e., simply choosing the one with the highest reward) and is instead often performed through trial and error. To improve sim-to-real transfer, most research focuses on the pre-convergence stage, employing techniques such as domain randomization, multi-fidelity training, adversaria
Authors
(none)
Tags
Stats
Related papers
- Overcoming The Sim-to-real Gap: Leveraging Simulation To Learn To Explore For Real-world RL (2024)5.84
- How To Pick The Domain Randomization Parameters For Sim-to-real Transfer Of Reinforcement Learning Policies? (2019)0.00
- An Advantage Based Policy Transfer Algorithm For Reinforcement Learning With Measures Of Transferability (2023)0.00
- Reward-conditioned Policies (2019)0.00
- Transfer Learning Across Simulated Robots With Different Sensors (2019)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Understanding Domain Randomization For Sim-to-real Transfer (2021)0.00
- Diversity For Contingency: Learning Diverse Behaviors For Efficient Adaptation And Transfer (2023)0.00