Learning Gaussian Policies From Corrective Human Feedback
2019 Β· Daan Wout, Jan Scholten, Carlos Celemin, et al.
Abstract
Learning from human feedback is a viable alternative to control design that does not require modelling or control expertise. Particularly, learning from corrective advice garners advantages over evaluative feedback as it is a more intuitive and scalable format. The current state-of-the-art in this field, COACH, has proven to be a effective approach for confined problems. However, it parameterizes the policy with Radial Basis Function networks, which require meticulous feature space engineering for higher order systems. We introduce Gaussian Process Coach (GPC), where feature space engineering is avoided by employing Gaussian Processes. In addition, we use the available policy uncertainty to 1) inquire feedback samples of maximal utility and 2) to adapt the learning rate to the teacher's learning phase. We demonstrate that the novel algorithm outperforms the current state-of-the-art in final performance, convergence rate and robustness to erroneous feedback in OpenAI Gym continuous cont
Authors
(none)
Tags
Stats
Related papers
- Autoregressive Policies For Continuous Control Deep Reinforcement Learning (2019)7.50
- Convergence Of A Human-in-the-loop Policy-gradient Algorithm With Eligibility Trace Under Reward, Policy, And Advantage Feedback (2021)0.00
- Deep Reinforcement Learning With Feedback-based Exploration (2019)5.84
- Proximal Policy Optimization With Continuous Bounded Action Space Via The Beta Distribution (2021)0.00
- Pref-guide: Continual Policy Learning From Real-time Human Feedback Via Preference-based Learning (2025)0.00
- Revisiting Gaussian Mixture Critics In Off-policy Reinforcement Learning: A Sample-based Approach (2022)0.00
- Robust And Adaptive Temporal-difference Learning Using An Ensemble Of Gaussian Processes (2021)0.00
- Smoothing Policies And Safe Policy Gradients (2019)7.50