Taming Equilibrium Bias In Risk-sensitive Multi-agent Reinforcement Learning
2024 Β· Yingjie Fei, Ruitu Xu
Abstract
We study risk-sensitive multi-agent reinforcement learning under general-sum Markov games, where agents optimize the entropic risk measure of rewards with possibly diverse risk preferences. We show that using the regret naively adapted from existing literature as a performance metric could induce policies with equilibrium bias that favor the most risk-sensitive agents and overlook the other agents. To address such deficiency of the naive regret, we propose a novel notion of regret, which we call risk-balanced regret, and show through a lower bound that it overcomes the issue of equilibrium bias. Furthermore, we develop a self-play algorithm for learning Nash, correlated, and coarse correlated equilibria in risk-sensitive Markov games. We prove that the proposed algorithm attains near-optimal regret guarantees with respect to the risk-balanced regret.
Authors
(none)
Tags
Stats
Related papers
- Strategically Robust Multi-agent Reinforcement Learning With Linear Function Approximation (2026)0.00
- Risk-sensitive Multi-agent Reinforcement Learning In Network Aggregative Markov Games (2024)0.00
- A Black-box Approach For Non-stationary Multi-agent Reinforcement Learning (2023)0.00
- Risk-sensitive Bayesian Games For Multi-agent Reinforcement Learning Under Policy Uncertainty (2022)0.00
- Regret Minimization And Convergence To Equilibria In General-sum Markov Games (2022)0.00
- Minimax-optimal Multi-agent RL In Markov Games With A Generative Model (2022)2.26
- Optimism As Risk-seeking In Multi-agent Reinforcement Learning (2025)0.00
- Exponential Bellman Equation And Improved Regret Bounds For Risk-sensitive Reinforcement Learning (2021)0.00