Adaptive Discretization For Episodic Reinforcement Learning In Metric Spaces
2019 Β· Sean R. Sinclair, Siddhartha Banerjee, Christina Lee Yu
Abstract
We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel \(Q\)-learning policy with adaptive data-driven discretization. The central idea is to maintain a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff estimates. We demonstrate how our adaptive partitions take advantage of the shape of the optimal \(Q\)-function and the joint space, without sacrificing the worst-case performance. In particular, we recover the regret guarantees of prior algorithms for continuous state-action spaces, which additionally require either an optimal discretization as input, and/or access to a simulation oracle. Moreover, experiments demonstrate how our algorithm automatically adapts to the underlying structure of the problem, resulting in much better performance compared both to heuristics and \(Q\)-learning with
Authors
(none)
Tags
Stats
Related papers
- How To Discretize Continuous State-action Spaces In Q-learning: A Symbolic Control Approach (2024)3.58
- Logarithmic Regret For Episodic Continuous-time Linear-quadratic Reinforcement Learning Over A Finite-time Horizon (2020)7.81
- Discretizing Continuous Action Space With Unimodal Probability Distributions For On-policy Reinforcement Learning (2024)0.00
- Episodic Reinforcement Learning With Expanded State-reward Space (2024)0.00
- Continuous Episodic Control (2022)2.26
- A Kernel-based Approach To Non-stationary Reinforcement Learning In Metric Spaces (2020)0.00
- Provably Efficient And Agile Randomized Q-learning (2025)0.00
- Continuous-time Reinforcement Learning: Ellipticity Enables Model-free Value Function Approximation (2026)0.00