Posterior Sampling For Competitive RL: Function Approximation And Partial Observation
2023 Β· Shuang Qiu, Ziyu Dai, Han Zhong, et al.
Abstract
This paper investigates posterior sampling algorithms for competitive reinforcement learning (RL) in the context of general function approximations. Focusing on zero-sum Markov games (MGs) under two critical settings, namely self-play and adversarial learning, we first propose the self-play and adversarial generalized eluder coefficient (GEC) as complexity measures for function approximation, capturing the exploration-exploitation trade-off in MGs. Based on self-play GEC, we propose a model-based self-play posterior sampling method to control both players to learn Nash equilibrium, which can successfully handle the partial observability of states. Furthermore, we identify a set of partially observable MG models fitting MG learning with the adversarial policies of the opponent. Incorporating the adversarial GEC, we propose a model-based posterior sampling method for learning adversarial MG with potential partial observability. We further provide low regret bounds for proposed algorithms
Authors
(none)
Tags
Stats
Related papers
- Prior-dependent Analysis Of Posterior Sampling Reinforcement Learning With Function Approximation (2024)0.00
- Posterior Sampling With Delayed Feedback For Reinforcement Learning With Linear Function Approximation (2023)0.00
- Sample-efficient Reinforcement Learning Of Partially Observable Markov Games (2022)0.00
- Minimax-optimal Multi-agent RL In Markov Games With A Generative Model (2022)2.26
- Online Sub-sampling For Reinforcement Learning With General Function Approximation (2021)0.00
- Towards General Function Approximation In Zero-sum Markov Games (2021)0.00
- On Reward-free RL With Kernel And Neural Function Approximations: Single-agent MDP And Markov Game (2021)0.00
- GEC: A Unified Framework For Interactive Decision Making In MDP, POMDP, And Beyond (2022)0.00