Stackelberg Batch Policy Learning
2023 Β· Wenzhuo Zhou, Annie Qu
Abstract
Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some type of pessimistic evaluation under the learned model, have emerged as a promising paradigm for batch RL. However, contemporary works on this stream have commonly overlooked the hierarchical decision-making structure hidden in the optimization landscape. In this paper, we adopt a game-theoretical viewpoint and model the policy learning diagram as a two-player general-sum game with a leader-follower structure. We propose a novel stochastic gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient, and the follower player makes individual updates and ensures transition-consistent pessimistic reasoning. The derived learning dynam
Authors
(none)
Tags
Stats
Related papers
- Model-free Reinforcement Learning For Stochastic Stackelberg Security Games (2020)5.24
- Iterative Batch Reinforcement Learning Via Safe Diversified Model-based Policy Search (2024)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Oracles & Followers: Stackelberg Equilibria In Deep Multi-agent Reinforcement Learning (2022)0.00
- Actions Speak What You Want: Provably Sample-efficient Reinforcement Learning Of The Quantal Stackelberg Equilibrium From Strategic Feedbacks (2023)0.00
- Batch Policy Learning In Average Reward Markov Decision Processes (2020)0.00
- Exponential Lower Bounds For Batch Reinforcement Learning: Batch RL Can Be Exponentially Harder Than Online RL (2020)0.00
- Sample-efficient Learning Of Stackelberg Equilibria In General-sum Games (2021)0.00