Abstract

This paper deals with distributed policy optimization in reinforcement learning, which involves a central controller and a group of learners. In particular, two typical settings encountered in several applications are considered: multi-agent reinforcement learning (RL) and parallel RL, where frequent information exchanges between the learners and the controller are required. For many practical distributed systems, however, the overhead caused by these frequent communication exchanges is considerable, and becomes the bottleneck of the overall performance. To address this challenge, a novel policy gradient approach is developed for solving distributed RL. The novel approach adaptively skips the policy gradient communication during iterations, and can reduce the communication overhead without degrading learning performance. It is established analytically that: i) the novel algorithm has convergence rate identical to that of the plain-vanilla policy gradient; while ii) if the distributed l

Authors

(none)

Tags

  • Policy Gradient
  • Multi-Agent

Stats

Related papers