Abstract
In this paper, we present a motion planning algorithm designed to guide agents, termed as player agents, optimally through multi-agent 3D urban air environments. The method integrates a sampling-based path planner, model-free optimal control, and a cognitive hierarchy model to predict the motion of other agents. Each player constructs a path through the environment, which is dynamically re-planned as the obstacle space of the environment evolves based on its online observations and the observations of cooperating players. The cognitive hierarchy model predicts the behavior of each agent in the environment, while a Gaussian process classification method estimates an unknown agent's level of rationality in real-time by observing each agent's kinodynamic distance. Once another agent's motion planning strategy is inferred, the player agents construct a predicted obstacle space based on each agent's expected motion to avoid potential collisions. Each player then traverses its planned path using a Q-learning controller. We validate the effectiveness of the proposed method in numerical experiments of a 3D urban air environment containing four and ten agents. We demonstrate that this approach is effective for reducing distance traveled by agents to reach their goals, mitigating the risk of collisions, and preventing deadlocks.