Robust And Adaptive Temporal-difference Learning Using An Ensemble Of Gaussian Processes
2021 Β· Qin Lu, Georgios B. Giannakis
Abstract
Value function approximation is a crucial module for policy evaluation in reinforcement learning when the state space is large or continuous. The present paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning, where a Gaussian process (GP) prior is presumed on the sought value function, and instantaneous rewards are probabilistically generated based on value function evaluations at two consecutive states. Capitalizing on a random feature-based approximant of the GP prior, an online scalable (OS) approach, termed \{OS-GPTD\}, is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs. To benchmark the performance of OS-GPTD even in an adversarial setting, where the modeling assumptions are violated, complementary worst-case analyses are performed by upper-bounding the cumulative Bellman error as well as the long-term reward prediction error, relative to their counterparts from a fixed value functi
Authors
(none)
Tags
Stats
Related papers
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Approximate Temporal Difference Learning Is A Gradient Descent For Reversible Policies (2018)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00
- Loss Dynamics Of Temporal Difference Reinforcement Learning (2023)0.00
- Regularized Gradient Temporal-difference Learning (2026)0.00
- Preferential Temporal Difference Learning (2021)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59