Actor-dual-critic Dynamics For Zero-sum And Identical-interest Stochastic Games
2026 Β· Ahmed Said Donmez, Yuksel Arslantas, Muhammed O. Sayin
Abstract
We propose a novel independent and payoff-based learning framework for stochastic games that is model-free, game-agnostic, and gradient-free. The learning dynamics follow a best-response-type actor-critic architecture, where agents update their strategies (actors) using feedback from two distinct critics: a fast critic that intuitively responds to observed payoffs under limited information, and a slow critic that deliberatively approximates the solution to the underlying dynamic programming problem. Crucially, the learning process relies on non-equilibrium adaptation through smoothed best responses to observed payoffs. We establish convergence to (approximate) equilibria in two-agent zero-sum and multi-agent identical-interest stochastic games over an infinite horizon. This provides one of the first payoff-based and fully decentralized learning algorithms with theoretical guarantees in both settings. Empirical results further validate the robustness and effectiveness of the proposed ap
Authors
(none)
Tags
Stats
Related papers
- Stackelberg Actor-critic: Game-theoretic Reinforcement Learning Algorithms (2021)0.00
- A Finite-sample Analysis Of Payoff-based Independent Learning In Zero-sum Stochastic Games (2023)0.00
- Last-iterate Convergence Of Payoff-based Independent Learning In Zero-sum Stochastic Games (2024)0.00
- Convergence Of Decentralized Actor-critic Algorithm In General-sum Markov Games (2024)3.58
- Convergence Of Heterogeneous Learning Dynamics In Zero-sum Stochastic Games (2023)2.26
- Independent And Decentralized Learning In Markov Potential Games (2022)0.00
- Best-response Dynamics And Fictitious Play In Identical-interest And Zero-sum Stochastic Games (2021)0.00
- Actor-critic Algorithms For Constrained Multi-agent Reinforcement Learning (2019)0.00