Sample-efficient Unsupervised Policy Cloning From Ensemble Self-supervised Labeled Videos
2024 Β· Xin Liu, Yaran Chen, Haoran Li
Abstract
Current advanced policy learning methodologies have demonstrated the ability to develop expert-level strategies when provided enough information. However, their requirements, including task-specific rewards, action-labeled expert trajectories, and huge environmental interactions, can be expensive or even unavailable in many scenarios. In contrast, humans can efficiently acquire skills within a few trials and errors by imitating easily accessible internet videos, in the absence of any other supervision. In this paper, we try to let machines replicate this efficient watching-and-learning process through Unsupervised Policy from Ensemble Self-supervised labeled Videos (UPESV), a novel framework to efficiently learn policies from action-free videos without rewards and any other expert supervision. UPESV trains a video labeling model to infer the expert actions in expert videos through several organically combined self-supervised tasks. Each task performs its duties, and they together enabl
Authors
(none)
Tags
Stats
Related papers
- Preventing Imitation Learning With Adversarial Policy Ensembles (2020)0.00
- Learning To Act Without Actions (2023)0.00
- Policy Learning Using Weak Supervision (2020)0.00
- Unsupervised Learning Of Efficient Exploration: Pre-training Adaptive Policies Via Self-imposed Goals (2026)0.00
- Self-supervised Adversarial Imitation Learning (2023)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Efficient Reinforcement Learning From Demonstration Using Local Ensemble And Reparameterization With Split And Merge Of Expert Policies (2022)0.00
- Good Better Best: Self-motivated Imitation Learning For Noisy Demonstrations (2023)0.00