When Is Realizability Sufficient For Off-policy Reinforcement Learning?
2022 Β· Andrea Zanette
Abstract
Model-free algorithms for reinforcement learning typically require a condition called Bellman completeness in order to successfully operate off-policy with function approximation, unless additional conditions are met. However, Bellman completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structural assumption and analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class. We establish finite-sample guarantees for off-policy reinforcement learning that are free of the approximation error term known as inherent Bellman error, and that depend on the interplay of three factors. The first two are well known: they are the metric entropy of the function class and the concentrability coefficient that represents the cost of learning off-policy. The third factor is new, and it measures the violation of Bellman complet
Authors
(none)
Tags
Stats
Related papers
- Offline Reinforcement Learning With Realizability And Single-policy Concentrability (2022)0.00
- Offline Reinforcement Learning Under Value And Density-ratio Realizability: The Power Of Gaps (2022)0.00
- What Are The Statistical Limits Of Offline RL With Linear Function Approximation? (2020)0.00
- Minimax Optimal And Computationally Efficient Algorithms For Distributionally Robust Offline Reinforcement Learning (2024)0.00
- The Optimal Approximation Factors In Misspecified Off-policy Value Function Estimation (2023)0.00
- Offline Reinforcement Learning: Fundamental Barriers For Value Function Approximation (2021)0.00
- Representations For Stable Off-policy Reinforcement Learning (2020)0.00
- Computationally Efficient RL Under Linear Bellman Completeness For Deterministic Dynamics (2024)0.00