Epopt: Learning Robust Neural Network Policies Using Model Ensembles
2016 Β· Aravind Rajeswaran, Sarvjeet Ghotra, Balaraman Ravindran, et al.
Abstract
Sample complexity and safety are major challenges when learning policies with reinforcement learning for real-world tasks, especially when the policies are represented using rich function approximators like deep neural networks. Model-based methods where the real-world target domain is approximated using a simulated source domain provide an avenue to tackle the above challenges by augmenting real data with simulated data. However, discrepancies between the simulated source domain and the target domain pose a challenge for simulated training. We introduce the EPOpt algorithm, which uses an ensemble of simulated source domains and a form of adversarial training to learn policies that are robust and generalize to a broad range of possible target domains, including unmodeled effects. Further, the probability distribution over source domains in the ensemble can be adapted using data from target domain and approximate Bayesian methods, to progressively make it a better approximation. Thus, l
Authors
(none)
Tags
Stats
Related papers
- Towards Applicable Reinforcement Learning: Improving The Generalization And Sample Efficiency With Policy Ensemble (2022)9.23
- Preventing Imitation Learning With Adversarial Policy Ensembles (2020)0.00
- SEERL: Sample Efficient Ensemble Reinforcement Learning (2020)2.26
- Multi-timescale Ensemble Q-learning For Markov Decision Process Policy Optimization (2024)6.34
- Sample Efficient Reinforcement Learning Via Model-ensemble Exploration And Exploitation (2021)0.00
- Adversarial Style Transfer For Robust Policy Optimization In Deep Reinforcement Learning (2023)0.00
- Robust Opponent Modeling Via Adversarial Ensemble Reinforcement Learning In Asymmetric Imperfect-information Games (2019)0.00
- Policy Optimization With Model-based Explorations (2018)5.84