An Optimal Policy For Learning Controllable Dynamics By Exploration
2025 Β· Peter N. Loxley
Abstract
Controllable Markov chains describe the dynamics of sequential decision making tasks and are the central component in optimal control and reinforcement learning. In this work, we give the general form of an optimal policy for learning controllable dynamics in an unknown environment by exploring over a limited time horizon. This policy is simple to implement and efficient to compute, and allows an agent to ``learn by exploring" as it maximizes its information gain in a greedy fashion by selecting controls from a constraint set that changes over time during exploration. We give a simple parameterization for the set of controls, and present an algorithm for finding an optimal policy. The reason for this policy is due to the existence of certain types of states that restrict control of the dynamics; such as transient states, absorbing states, and non-backtracking states. We show why the occurrence of these states makes a non-stationary policy essential for achieving optimal exploration. Si
Authors
(none)
Tags
Stats
Related papers
- Learning Controllable Dynamics Through Informative Exploration (2025)0.00
- Optimal Exploration For Model-based RL In Nonlinear Systems (2023)0.00
- Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach (2023)0.00
- Task-optimal Exploration In Linear Dynamical Systems (2021)0.00
- Conservative Exploration In Reinforcement Learning (2020)0.00
- Active Exploration Via Experiment Design In Markov Chains (2022)0.00
- Optimistic Active Exploration Of Dynamical Systems (2023)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00