Equilibrium Policy Generalization: A Reinforcement Learning Framework For Cross-graph Zero-shot Generalization In Pursuit-evasion Games
2025 Β· Runyu Lu, Peng Zhang, Ruochuan Shi, et al.
Abstract
Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zero-shot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios. These two generalizability properties, to our knowledge, are the first to appear in this domain. The core idea of the EPG framework is to train an RL policy across different graph structures against the e
Authors
(none)
Tags
Stats
Related papers
- A Generative Machine Learning Approach To Policy Optimization In Pursuit-evasion Games (2020)0.00
- Efficient Use Of Heuristics For Accelerating Xcs-based Policy Learning In Markov Games (2020)0.00
- Provably Efficient Generalized Lagrangian Policy Optimization For Safe Multi-agent Reinforcement Learning (2023)0.00
- Explore Reinforced: Equilibrium Approximation With Reinforcement Learning (2024)0.00
- Pipeline PSRO: A Scalable Approach For Finding Approximate Nash Equilibria In Large Games (2020)0.00
- Evolution-guided Policy Gradient In Reinforcement Learning (2018)0.00
- Towards Applicable Reinforcement Learning: Improving The Generalization And Sample Efficiency With Policy Ensemble (2022)9.23
- Strategic Communication Under Threat: Learning Information Trade-offs In Pursuit-evasion Games (2025)0.00