Model-free Learning And Optimal Policy Design In Multi-agent Mdps Under Probabilistic Agent Dropout
2023 Β· Carmel Fiscko, Soummya Kar, Bruno Sinopoli
Abstract
This work studies a multi-agent Markov decision process (MDP) that can undergo agent dropout and the computation of policies for the post-dropout system based on control and sampling of the pre-dropout system. The central planner's objective is to find an optimal policy that maximizes the value of the expected system given a priori knowledge of the agents' dropout probabilities. For MDPs with a certain transition independence and reward separability structure, we assume that removing agents from the system forms a new MDP comprised of the remaining agents with new state and action spaces, transition dynamics that marginalize the removed agents, and rewards that are independent of the removed agents. We first show that under these assumptions, the value of the expected post-dropout system can be represented by a single MDP; this "robust MDP" eliminates the need to evaluate all \(2^N\) realizations of the system, where N denotes the number of agents. More significantly, in a model-free c
Authors
(none)
Tags
Stats
Related papers
- Scalable Planning In Multi-agent Mdps (2021)0.00
- Offline Bayesian Aleatoric And Epistemic Uncertainty Quantification And Posterior Value Optimisation In Finite-state Mdps (2024)0.95
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00
- An Offline Risk-aware Policy Selection Method For Bayesian Markov Decision Processes (2021)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- Provable Cooperative Multi-agent Exploration For Reward-free Mdps (2026)0.00
- Regret-optimal Model-free Reinforcement Learning For Discounted Mdps With Short Burn-in Time (2023)0.00
- Policy Learning For Robust Markov Decision Process With A Mismatched Generative Model (2022)0.00