Actor-critic Policy Optimization In Partially Observable Multiagent Environments
2018 Β· Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, et al.
Abstract
Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates simi
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Off-policy Actor-critic Reinforcement Learning For Partially Observable Environments (2024)2.26
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81
- Natural Policy Gradient And Actor Critic Methods For Constrained Multi-task Reinforcement Learning (2024)0.00
- Actor-critic Algorithms For Constrained Multi-agent Reinforcement Learning (2019)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Multi-agent Actor-critic For Mixed Cooperative-competitive Environments (2017)0.00
- Policy Optimization Over General State And Action Spaces (2022)0.00
- Robust And Diverse Multi-agent Learning Via Rational Policy Gradient (2025)0.00