Q-learning In Continuous Time
2022 Β· Yanwei Jia, Xun Yu Zhou
Abstract
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ``(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a ``q-learning" theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms inter
Authors
(none)
Tags
Stats
Related papers
- Continuous-time Q-learning For Mean-field Control Problems (2023)0.00
- Continuous-time Risk-sensitive Reinforcement Learning Via Quadratic Variation Penalty (2024)0.00
- Reward-directed Score-based Diffusion Models Via Q-learning (2024)0.00
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Continuous-time Reinforcement Learning: Ellipticity Enables Model-free Value Function Approximation (2026)0.00
- Entropy-regularized Diffusion Policy With Q-ensembles For Offline Reinforcement Learning (2024)3.58
- How To Discretize Continuous State-action Spaces In Q-learning: A Symbolic Control Approach (2024)3.58
- Optimal Scheduling Of Entropy Regulariser For Continuous-time Linear-quadratic Reinforcement Learning (2022)4.52