Bootstrapping Task Spaces For Self-improvement
2025 Β· Minqi Jiang, Andrei Lupu, Yoram Bachrach
Abstract
Progress in many task domains emerges from repeated revisions to previous solution attempts. Training agents that can reliably self-improve over such sequences at inference-time is a natural target for reinforcement learning (RL), yet the naive approach assumes a fixed maximum iteration depth, which can be both costly and arbitrary. We present Exploratory Iteration (ExIt), a family of autocurriculum RL methods that directly exploits the recurrent structure of self-improvement tasks to train LLMs to perform multi-step self-improvement at inference-time while only training on the most informative single-step iterations. ExIt grows a task space by selectively sampling the most informative intermediate, partial histories encountered during an episode for continued iteration, treating these starting points as new self-iteration task instances to train a self-improvement policy. ExIt can further pair with explicit exploration mechanisms to sustain greater task diversity. Across several domai
Authors
(none)
Tags
Stats
Related papers
- Learn The Ropes, Then Trust The Wins: Self-imitation With Progressive Exploration For Agentic Reinforcement Learning (2025)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Skills: Adaptive Skill Sequencing For Efficient Temporally-extended Exploration (2022)0.00
- Unsupervised Learning Of Efficient Exploration: Pre-training Adaptive Policies Via Self-imposed Goals (2026)0.00
- Exploration Via Elliptical Episodic Bonuses (2022)3.58
- Generating Automatic Curricula Via Self-supervised Active Domain Randomization (2020)0.00
- Learning Off-policy With Model-based Intrinsic Motivation For Active Online Exploration (2024)0.00
- Growing Action Spaces (2019)0.00