When to Switch: Planning and Learning for Partially Observable Multi-Agent Pathfinding

Switches, Planning, Reinforcement learning, Decision making, Training, Task analysis, Learning systems

Abstract

Multi-agent pathfinding (MAPF) is a problem that involves finding a set of non-conflicting paths for a set of agents confined to a graph. In this work, we study a MAPF setting, where the environment is only partially observable for each agent, i.e., an agent observes the obstacles and other agents only within a limited field-of-view. Moreover, we assume that the agents do not communicate and do not share knowledge on their goals, intended actions, etc. The task is to construct a policy that maps the agent’s observations to actions. Our contribution is multifold. First, we propose two novel policies for solving partially observable MAPF (PO-MAPF): one based on heuristic search and another one based on reinforcement learning (RL). Next, we introduce a mixed policy that is based on switching between the two. We suggest three different switch scenarios: the heuristic, the deterministic, and the learnable one. A thorough empirical evaluation of all the proposed policies in a variety of setups shows that the mixing policy demonstrates the best performance is able to generalize well to the unseen maps and problem instances, and, additionally, outperforms the state-of-the-art counterparts (Primal2 and PICO). The source-code is available at https://github.com/AIRI-Institute/when-to-switch.

Full text