Minimax Weight And Q-function Learning For Off-policy Evaluation
2019 Β· Masatoshi Uehara, Jiawei Huang, Nan Jiang
Abstract
We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions. Our contributions include: (1) A new estimator, MWL, that directly estimates importance ratios over the state-action distributions, removing the reliance on knowledge of the behavior policy as in prior work (Liu et al., 2018). (2) Another new estimator, MQL, obtained by swapping the roles of importance weights and value-functions in MWL. MQL has an intuitive interpretation of minimizing average Bellman errors and can be combined with MWL in a doubly robust manner. (3) Several additional results that offer further insights into these methods, including the sample complexity analyses of MWL and MQL, their asymptotic optimality in the tabular setting, how the learned importance weights depend the choice of the discriminator class, and how our methods provide a unified view of some old and new algorithms in RL.
Authors
(none)
Tags
Stats
Related papers
- Minimax-optimal Off-policy Evaluation With Linear Function Approximation (2020)0.00
- Policy Optimization Via Adv2: Adversarial Learning On Advantage Functions (2023)0.00
- Fitted Q Evaluation Without Bellman Completeness Via Stationary Weighting (2025)0.00
- Off-policy Fitted Q-evaluation With Differentiable Function Approximators: Z-estimation And Inference Theory (2022)0.00
- The Optimal Approximation Factors In Misspecified Off-policy Value Function Estimation (2023)0.00
- A Nearly Optimal And Low-switching Algorithm For Reinforcement Learning With General Function Approximation (2023)0.00
- Utilizing Maximum Mean Discrepancy Barycenter For Propagating The Uncertainty Of Value Functions In Reinforcement Learning (2024)0.00
- Weighted QMIX: Expanding Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2020)0.00