Offline Reinforcement Learning: Role Of State Aggregation And Trajectory Data
2024 Β· Zeyu Jia, Alexander Rakhlin, Ayush Sekhari, et al.
Abstract
We revisit the problem of offline reinforcement learning with value function realizability but without Bellman completeness. Previous work by Xie and Jiang (2021) and Foster et al. (2022) left open the question whether a bounded concentrability coefficient along with trajectory-based offline data admits a polynomial sample complexity. In this work, we provide a negative answer to this question for the task of offline policy evaluation. In addition to addressing this question, we provide a rather complete picture for offline policy evaluation with only value function realizability. Our primary findings are threefold: 1) The sample complexity of offline policy evaluation is governed by the concentrability coefficient in an aggregated Markov Transition Model jointly determined by the function class and the offline data distribution, rather than that in the original MDP. This unifies and generalizes the ideas of Xie and Jiang (2021) and Foster et al. (2022), 2) The concentrability coeffici
Authors
(none)
Tags
Stats
Related papers
- Trajectory Data Suffices For Statistically Efficient Learning In Offline RL With Linear \(q^\pi\)-realizability And Concentrability (2024)0.00
- Offline Reinforcement Learning: Fundamental Barriers For Value Function Approximation (2021)0.00
- Offline Reinforcement Learning With Realizability And Single-policy Concentrability (2022)0.00
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00
- Harnessing Mixed Offline Reinforcement Learning Datasets Via Trajectory Weighting (2023)0.00
- A Complete Characterization Of Linear Estimators For Offline Policy Evaluation (2022)0.00
- Projected State-action Balancing Weights For Offline Reinforcement Learning (2021)0.00
- Offline Reinforcement Learning Under Value And Density-ratio Realizability: The Power Of Gaps (2022)0.00