Learning More Skills Through Optimistic Exploration
2021 Β· Dj Strouse, Kate Baumli, David Warde-Farley, et al.
Abstract
Unsupervised skill learning objectives (Gregor et al., 2016, Eysenbach et al., 2018) allow agents to learn rich repertoires of behavior in the absence of extrinsic rewards. They work by simultaneously training a policy to produce distinguishable latent-conditioned trajectories, and a discriminator to evaluate distinguishability by trying to infer latents from trajectories. The hope is for the agent to explore and master the environment by encouraging each skill (latent) to reliably reach different states. However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective. To combat this inherent pessimism towards exploration, we derive an information gain auxiliary objective that involves traini
Authors
(none)
Tags
Stats
Related papers
- Skild: Unsupervised Skill Discovery Guided By Factor Interactions (2024)0.00
- Diversity Is All You Need: Learning Skills Without A Reward Function (2018)0.00
- Unsupervised Learning Of Efficient Exploration: Pre-training Adaptive Policies Via Self-imposed Goals (2026)0.00
- Focused Skill Discovery: Learning To Control Specific State Variables While Minimizing Side Effects (2025)0.00
- Skills: Adaptive Skill Sequencing For Efficient Temporally-extended Exploration (2022)0.00
- Never Give Up: Learning Directed Exploration Strategies (2020)0.00
- Exploitation Is All You Need... For Exploration (2025)0.00
- Learn The Ropes, Then Trust The Wins: Self-imitation With Progressive Exploration For Agentic Reinforcement Learning (2025)0.00