GEC: A Unified Framework For Interactive Decision Making In MDP, POMDP, And Beyond
2022 Β· Han Zhong, Wei Xiong, Sirui Zheng, et al.
Abstract
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation in online interactive decision making. In specific, GEC captures the hardness of exploration by comparing the error of predicting the performance of the updated policy with the in-sample training error evaluated on the historical data. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR, where generalized regular PSR, a new tr
Authors
(none)
Tags
Stats
Related papers
- Unified Algorithms For RL With Decision-estimation Coefficients: PAC, Reward-free, Preference-based Learning, And Beyond (2022)5.24
- Optimal Decision-making In Mixed-agent Partially Observable Stochastic Environments Via Reinforcement Learning (2019)0.00
- Embed To Control Partially Observed Systems: Representation Learning With Provable Sample Efficiency (2022)0.00
- Posterior Sampling For Competitive RL: Function Approximation And Partial Observation (2023)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- Double Reinforcement Learning For Efficient Off-policy Evaluation In Markov Decision Processes (2019)0.00
- Provably Efficient Ucb-type Algorithms For Learning Predictive State Representations (2023)0.00
- On The Complexity Of Multi-agent Decision Making: From Learning In Games To Partial Monitoring (2023)0.00