Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields For Multi-agent Reinforcement Learning
2024 Β· Satchit Chatterji, Erman Acar
Abstract
Safe reinforcement learning (RL) is crucial for real-world applications, and multi-agent interactions introduce additional safety challenges. While Probabilistic Logic Shields (PLS) has been a powerful proposal to enforce safety in single-agent RL, their generalizability to multi-agent settings remains unexplored. In this paper, we address this gap by conducting extensive analyses of PLS within decentralized, multi-agent environments, and in doing so, propose \(\textbf\{Shielded Multi-Agent Reinforcement Learning (SMARL)\}\) as a general framework for steering MARL towards norm-compliant outcomes. Our key contributions are: (1) a novel Probabilistic Logic Temporal Difference (PLTD) update for shielded, independent Q-learning, which incorporates probabilistic constraints directly into the value update process; (2) a probabilistic logic policy gradient method for shielded PPO with formal safety guarantees for MARL; and (3) comprehensive evaluation across symmetric and asymmetrically shie
Authors
(none)
Tags
Stats
Related papers
- An Abstraction-based Method To Check Multi-agent Deep Reinforcement-learning Behaviors (2021)2.26
- Safe Multi-agent Reinforcement Learning With Convergence To Generalized Nash Equilibrium (2024)0.00
- MARSHAL: Incentivizing Multi-agent Reasoning Via Self-play With Strategic Llms (2025)0.00
- Multi-agent Constrained Policy Optimisation (2021)0.00
- Deepsafempc: Deep Learning-based Model Predictive Control For Safe Multi-agent Reinforcement Learning (2024)0.00
- Co2po: Coordinated Constrained Policy Optimization For Multi-agent RL (2026)0.00
- SUB-PLAY: Adversarial Policies Against Partially Observed Multi-agent Reinforcement Learning Systems (2024)0.00
- Safe Reinforcement Learning In Black-box Environments Via Adaptive Shielding (2024)2.26