Temporal Difference Learning With Continuous Time And State In The Stochastic Setting
2022 Β· Ziad Kobeissi, Francis Bach
Abstract
We consider the problem of continuous-time policy evaluation. This consists in learning through observations the value function associated with an uncontrolled continuous-time stochastic dynamic and a reward function. We propose two original variants of the well-known TD(0) method using vanishing time steps. One is model-free and the other is model-based. For both methods, we prove theoretical convergence rates that we subsequently verify through numerical simulations. Alternatively, those methods can be interpreted as novel reinforcement learning approaches for approximating solutions of linear PDEs (partial differential equations) or linear BSDEs (backward stochastic differential equations).
Authors
(none)
Tags
Stats
Related papers
- Policy Evaluation And Temporal-difference Learning In Continuous Time And Space: A Martingale Approach (2021)4.52
- Preferential Temporal Difference Learning (2021)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Prediction And Control In Continual Reinforcement Learning (2023)0.00
- Deterministic Policy Gradient For Reinforcement Learning With Continuous Time And State (2025)0.00
- Approximate Temporal Difference Learning Is A Gradient Descent For Reversible Policies (2018)0.00
- Single-timescale Stochastic Nonconvex-concave Optimization For Smooth Nonlinear TD Learning (2020)0.00