Strategic Decision-making In The Presence Of Information Asymmetry: Provably Efficient RL With Algorithmic Instruments
2022 Β· Mengxin Yu, Zhuoran Yang, Jianqing Fan
Abstract
We study offline reinforcement learning under a novel model called strategic MDP, which characterizes the strategic interactions between a principal and a sequence of myopic agents with private types. Due to the bilevel structure and private types, strategic MDP involves information asymmetry between the principal and the agents. We focus on the offline RL problem, where the goal is to learn the optimal policy of the principal concerning a target population of agents based on a pre-collected dataset that consists of historical interactions. The unobserved private types confound such a dataset as they affect both the rewards and observations received by the principal. We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN), which leverages the ideas of instrumental variable regression and the pessimism principle to learn a near-optimal principal's policy in the context of general function approximation. Our algorithm is based on the critical observa
Authors
(none)
Tags
Stats
Related papers
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00
- Is Pessimism Provably Efficient For Offline RL? (2020)0.00
- Offline Reinforcement Learning With Instrumental Variables In Confounded Markov Decision Processes (2022)0.00
- Nearly Minimax Optimal Offline Reinforcement Learning With Linear Function Approximation: Single-agent MDP And Markov Game (2022)0.00
- Optimistic Policy Learning Under Pessimistic Adversaries With Regret And Violation Guarantees (2026)0.00
- Bridging Offline Reinforcement Learning And Imitation Learning: A Tale Of Pessimism (2021)0.00
- Conservative Equilibrium Discovery In Offline Game-theoretic Multiagent Reinforcement Learning (2026)0.00
- Reinforcement Learning With Human Feedback: Learning Dynamic Choices Via Pessimism (2023)0.00