Gradient-free Policy Architecture Search And Adaptation
2017 Β· Sayna Ebrahimi, Anna Rohrbach, Trevor Darrell
Abstract
We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent's lifetime as it learns to drive in a realistic simulated environment.
Authors
(none)
Tags
Stats
Related papers
- Gradient-aware Model-based Policy Search (2019)6.77
- Policy Gradient From Demonstration And Curiosity (2020)0.00
- Safe Driving Via Expert Guided Policy Optimization (2021)0.00
- Non-parametric Stochastic Policy Gradient With Strategic Retreat For Non-stationary Environment (2022)0.00
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- Directed Policy Gradient For Safe Reinforcement Learning With Human Advice (2018)0.00
- Hierarchical Policy-gradient Reinforcement Learning For Multi-agent Shepherding Control Of Non-cohesive Targets (2025)0.00
- Softtreemax: Policy Gradient With Tree Search (2022)0.00