Optimistic Active Exploration Of Dynamical Systems
2023 Β· Bhavya Sukhija, Lenart Treven, Cansu Sancaktar, et al.
Abstract
Reinforcement learning algorithms commonly seek to optimize policies for solving one particular task. How should we explore an unknown dynamical system such that the estimated model globally approximates the dynamics and allows us to solve multiple downstream tasks in a zero-shot manner? In this paper, we address this challenge, by developing an algorithm -- OPAX -- for active exploration. OPAX uses well-calibrated probabilistic models to quantify the epistemic uncertainty about the unknown dynamics. It optimistically -- w.r.t. to plausible dynamics -- maximizes the information gain between the unknown dynamics and state observations. We show how the resulting optimization problem can be reduced to an optimal control problem that can be solved at each episode using standard approaches. We analyze our algorithm for general models, and, in the case of Gaussian process dynamics, we give a first-of-its-kind sample complexity bound and show that the epistemic uncertainty converges to zero.
Authors
(none)
Tags
Stats
Related papers
- Efficient Exploration In Continuous-time Model-based Reinforcement Learning (2023)0.00
- Provably Efficient Exploration In Policy Optimization (2019)0.00
- Optimal Exploration For Model-based RL In Nonlinear Systems (2023)0.00
- Task-optimal Exploration In Linear Dynamical Systems (2021)0.00
- An Optimal Policy For Learning Controllable Dynamics By Exploration (2025)0.00
- Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach (2023)0.00
- Proximal Policy Optimization With Adaptive Exploration (2024)0.00
- Efficient Model-based Reinforcement Learning Through Optimistic Policy Search And Planning (2020)0.00