Abstract
We study two reproducible failure modes of deep multi-agent reinforcement learning in continuous-time pricing markets: (i) tacit cartel formation between competing DDPG agents, and (ii) actor--critic instability at high event rates. We instantiate both inside a single CT-MARL benchmark (Poisson-clocked price updates, observation latency , interior-optimum logit demand), show that synchronous DDPG agents reliably trigger Failure Mode 1 with collusion index , and quantify a partial microstructure fix: asynchrony alone cuts collusion by 48\% and adding latency drives it to a minimum of . The fix has clearly documented costs: it is partial ( remains supra-Bertrand), it is non-monotone in , and it does not survive Failure Mode 2, which emerges as DDPG critic divergence at and corrupts the phase-diagram cell at . We accompany the scalar collusion index with trajectory-level trace diagnostics that expose the within-episode signalling collapse and the post-shock non-recovery.