Exploiting Action Impact Regularity And Exogenous State Variables For Offline Reinforcement Learning
2021 Β· Vincent Liu, James R. Wright, Martha White
Abstract
Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regularity (AIR), is that actions primarily impact a part of the state (an endogenous component) and have limited impact on the remaining part of the state (an exogenous component). AIR is a strong assumption, but it nonetheless holds in a number of real-world domains including financial markets. We discuss algorithms that exploit the AIR property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration. Finally, we demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world
Authors
(none)
Tags
Stats
Related papers
- A Behavior Regularized Implicit Policy For Offline Reinforcement Learning (2022)0.00
- Mutual Information Regularized Offline Reinforcement Learning (2022)0.00
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Constrained Latent Action Policies For Model-based Offline Reinforcement Learning (2024)0.00
- Adaptive Advantage-guided Policy Regularization For Offline Reinforcement Learning (2024)3.09
- A2PO: Towards Effective Offline Reinforcement Learning From An Advantage-aware Perspective (2024)1.69
- An Investigation Of Offline Reinforcement Learning In Factorisable Action Spaces (2024)0.00