← all papers · overview

Learning to Contest: Decentralized Robust Fairness in Cooperative MARL via Cross-Attention

Abstract

Fair cooperative multi-agent reinforcement learning (MARL) teams that maximize an egalitarian welfare are exploitable: a single self-interested agent free-rides on the surplus that fair agents forgo to raise the worst-off, and the known remedy is a centralized need-based allocator. We show that a decentralized defense becomes possible once contention is graded: when a contested resource still delivers a fraction 1c1-c, a worst-off cooperator that contests a free-rider strictly improves on yielding, so leverage exists for every c<1c < 1. We introduce CAN, a permutation-equivariant cross-attention policy over agents' observed behaviour that infers how many free-riders are present and responds proportionally -- turn-taking when none, contesting just enough when some. Trained against an adversarial league, CAN keeps best-response exploitability near the centralized oracle (ρ1.21.5\rho \approx 1.2\text{--}1.5 vs. ρ=N\rho = N unprotected) at essentially no efficiency cost, whereas the fair-MARL learners (GGF, FEN, SOTO) each collapse to an exploitable or wasteful extreme. Giving those objectives CAN's identical adversarial training does not rescue them, so the objective -- not adversarial training alone -- is what makes hardening possible. Against a committed (non-adaptive) defector, every learned defense including ours provides deterrence rather than immunity, weakening as the leverage (1c)/2(1-c)/2 vanishes. Across further environments and team sizes the same principle sets the scope: robustness holds exactly as far as the game's contest leverage reaches, and we map that boundary rather than claim to remove it.

Related papers

Ranked by semantic similarity — how closely each paper's abstract matches this one (100% = near-identical topic).