Abstract

We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a Scalable Actor Critic (SAC) framework that exploits the network structure and finds a localized policy that is an \(O(\rho^\{\kappa\})\)-approximation of a stationary point of the objective for some \(\rho\in(0,1)\), with complexity that scales with the local state-action space size of the largest \(\kappa\)-hop neighborhood of the network. We illustrate our model and approach using examples from wireless communication, epidemics and traffic.

Authors

(none)

Tags

  • Multi-Agent

Stats

  • citations23
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score10.35
  • arxiv keyqu2019scalable

Related papers