Learning to Contest: Decentralized Robust Fairness in Cooperative MARL via Cross-Attention

Abstract

Fair cooperative multi-agent reinforcement learning (MARL) teams that maximize an egalitarian welfare are exploitable: a single self-interested agent free-rides on the surplus that fair agents forgo to raise the worst-off, and the known remedy is a centralized need-based allocator. We show that a decentralized defense becomes possible once contention is graded: when a contested resource still delivers a fraction $1-c$ , a worst-off cooperator that contests a free-rider strictly improves on yielding, so leverage exists for every $c < 1$ . We introduce CAN, a permutation-equivariant cross-attention policy over agents' observed behaviour that infers how many free-riders are present and responds proportionally -- turn-taking when none, contesting just enough when some. Trained against an adversarial league, CAN keeps best-response exploitability near the centralized oracle ( $\rho \approx 1.2\text{--}1.5$ vs. $\rho = N$ unprotected) at essentially no efficiency cost, whereas the fair-MARL learners (GGF, FEN, SOTO) each collapse to an exploitable or wasteful extreme. Giving those objectives CAN's identical adversarial training does not rescue them, so the objective -- not adversarial training alone -- is what makes hardening possible. Against a committed (non-adaptive) defector, every learned defense including ours provides deterrence rather than immunity, weakening as the leverage $(1-c)/2$ vanishes. Across further environments and team sizes the same principle sets the scope: robustness holds exactly as far as the game's contest leverage reaches, and we map that boundary rather than claim to remove it.

Abstract

Related papers