Optimism As Risk-seeking In Multi-agent Reinforcement Learning
2025 Β· Runyu Zhang, Na Li, Asuman Ozdaglar, et al.
Abstract
Risk sensitivity has become a central theme in reinforcement learning (RL), where convex risk measures and robust formulations provide principled ways to model preferences beyond expected return. Recent extensions to multi-agent RL (MARL) have largely emphasized the risk-averse setting, prioritizing robustness to uncertainty. In cooperative MARL, however, such conservatism often leads to suboptimal equilibria, and a parallel line of work has shown that optimism can promote cooperation. Existing optimistic methods, though effective in practice, are typically heuristic and lack theoretical grounding. Building on the dual representation for convex risk measures, we propose a principled framework that interprets risk-seeking objectives as optimism. We introduce optimistic value functions, which formalize optimism as divergence-penalized risk-seeking evaluations. Building on this foundation, we derive a policy-gradient theorem for optimistic value functions, including explicit formulas for
Authors
(none)
Tags
Stats
Related papers
- Toward Risk-based Optimistic Exploration For Cooperative Multi-agent Reinforcement Learning (2023)0.00
- Risk-sensitive Multi-agent Reinforcement Learning In Network Aggregative Markov Games (2024)0.00
- Strategically Efficient Exploration In Competitive Multi-agent Reinforcement Learning (2021)0.00
- Taming Equilibrium Bias In Risk-sensitive Multi-agent Reinforcement Learning (2024)0.00
- Co2po: Coordinated Constrained Policy Optimization For Multi-agent RL (2026)0.00
- Optimistic Multi-agent Policy Gradient (2023)0.00
- Optimistic Policy Learning Under Pessimistic Adversaries With Regret And Violation Guarantees (2026)0.00
- Epistemic Risk-sensitive Reinforcement Learning (2019)0.00