Adaptive Opponent Policy Detection In Multi-agent Mdps: Real-time Strategy Switch Identification Using Running Error Estimation
2024 Β· Mohidul Haque Mridul, Mohammad Foysal Khan, Redwan Ahmed Rizvee, et al.
Abstract
In Multi-agent Reinforcement Learning (MARL), accurately perceiving opponents' strategies is essential for both cooperative and adversarial contexts, particularly within dynamic environments. While Proximal Policy Optimization (PPO) and related algorithms such as Actor-Critic with Experience Replay (ACER), Trust Region Policy Optimization (TRPO), and Deep Deterministic Policy Gradient (DDPG) perform well in single-agent, stationary environments, they suffer from high variance in MARL due to non-stationary and hidden policies of opponents, leading to diminished reward performance. Additionally, existing methods in MARL face significant challenges, including the need for inter-agent communication, reliance on explicit reward information, high computational demands, and sampling inefficiencies. These issues render them less effective in continuous environments where opponents may abruptly change their policies without prior notice. Against this background, we present OPS-DeMo (Online Poli
Authors
(none)
Tags
Stats
Related papers
- Model-based Multi-agent Policy Optimization With Adaptive Opponent-wise Rollouts (2021)0.00
- Metric Policy Representations For Opponent Modeling (2021)0.00
- SUB-PLAY: Adversarial Policies Against Partially Observed Multi-agent Reinforcement Learning Systems (2024)0.00
- Efficient Policy Learning For Non-stationary Mdps Under Adversarial Manipulation (2019)0.00
- Robust And Diverse Multi-agent Learning Via Rational Policy Gradient (2025)0.00
- Learning To Model Opponent Learning (2020)0.00
- MACRPO: Multi-agent Cooperative Recurrent Policy Optimization (2021)0.00
- Online Robust Policy Learning In The Presence Of Unknown Adversaries (2018)0.00