Policy Learning Using Weak Supervision
2020 Β· Jingkang Wang, Hongyi Guo, Zhaowei Zhu, et al.
Abstract
Most existing policy learning solutions require the learning agents to receive high-quality supervision signals such as well-designed rewards in reinforcement learning (RL) or high-quality expert demonstrations in behavioral cloning (BC). These quality supervisions are usually infeasible or prohibitively expensive to obtain in practice. We aim for a unified framework that leverages the available cheap weak supervisions to perform policy learning efficiently. To handle this problem, we treat the "weak supervision" as imperfect information coming from a peer agent, and evaluate the learning agent's policy based on a "correlated agreement" with the peer agent's policy (instead of simple agreements). Our approach explicitly punishes a policy for overfitting to the weak supervision. In addition to theoretical guarantees, extensive evaluations on tasks including RL with noisy rewards, BC with weak demonstrations, and standard policy co-training show that our method leads to substantial perfo
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Cooperation Through Learning-aware Policy Gradients (2024)0.00
- Reward-conditioned Policies (2019)0.00
- On-policy Robot Imitation Learning From A Converging Supervisor (2019)0.00
- Policy Agnostic RL: Offline RL And Online RL Fine-tuning Of Any Class And Backbone (2024)0.00
- Policy Learning For Off-dynamics RL With Deficient Support (2024)0.00
- Online Learning Of Deceptive Policies Under Intermittent Observation (2025)0.00
- Cooperative Multi-agent Policy Gradients With Sub-optimal Demonstration (2018)0.00
- Preventing Imitation Learning With Adversarial Policy Ensembles (2020)0.00