Using Reinforcement Learning To Herd A Robotic Swarm To A Target Distribution
2020 · Zahi M. Kakish, Karthik Elamvazhuthi, Spring Berman
Abstract
In this paper, we present a reinforcement learning approach to designing a control policy for a "leader" agent that herds a swarm of "follower" agents, via repulsive interactions, as quickly as possible to a target probability distribution over a strongly connected graph. The leader control policy is a function of the swarm distribution, which evolves over time according to a mean-field model in the form of an ordinary difference equation. The dependence of the policy on agent populations at each graph vertex, rather than on individual agent activity, simplifies the observations required by the leader and enables the control strategy to scale with the number of agents. Two Temporal-Difference learning algorithms, SARSA and Q-Learning, are used to generate the leader control policy based on the follower agent distribution and the leader's location on the graph. A simulation environment corresponding to a grid graph with 4 vertices was used to train and validate the control policies for
Authors
(none)
Tags
Stats
Related papers
- Hierarchical Policy-gradient Reinforcement Learning For Multi-agent Shepherding Control Of Non-cohesive Targets (2025)0.00
- Guided Deep Reinforcement Learning For Swarm Systems (2017)0.00
- From Pheromones To Policies: Reinforcement Learning For Engineered Biological Swarms (2025)0.00
- A Scalable Reinforcement Learning Approach For Attack Allocation In Swarm To Swarm Engagement Problems (2022)0.00
- Learning To Simulate Self-driven Particles System With Coordinated Policy Optimization (2021)0.00
- Inverse Reinforcement Learning In Swarm Systems (2016)2.26
- Deep Reinforcement Learning For Swarm Systems (2018)0.00
- Scalable Reinforcement Learning Policies For Multi-agent Control (2020)10.21