Deep Combined Reinforcement Learning Planner for Multi-Robot System

Qingyang Tan (
Dunbang He (
Shuangqi Luo (

1. Research Goal

In this project, we want to build a planner for multi-robot system, using local and global information and dividing policy decision process into two phases. We want to leverage the power of reinforcement learning, and train the policy in a simulation scenario, instead of using real-word training data.

There exist successful decentralized methods [1, 2, 3] based on reinforcement learning to navigate multi-robot system to avoid collision. Decentralized method does not require frequent communication and only uses local observation; However, with improvement of communication and positioning technology, e.g. 5G network and wifi-based indoor positioning system, we can more easily acquire global system status. Meanwhile, centralized system can guarantee safety, completeness, and approximate optimality solution [1]. Thus, in this project, we want to build a system using global information and computation in a simulated environment. To reduce complexity, we also assume the system has no communication latency and has a clear global sensor measurement.

We plan to formulate global information as a map with each agent’s location. This formulation can fit most of large systems no matter how many agents the system have. Also, we can use sophisticated convolutional neural network (CNN) to process this image-like input, to better understand the running environment.

We cannot deny that fully-centralized method has some drawbacks compared to fully distributed methods. For example, fully-centralized method computation rely on one central server. The central server may cannot process huge-amount of information, or the system may break down once the central server stopped. Thus, in this project, we want to divide the planner into two phases, i.e. global decision period with global information on central computer and local decision period with each agent’s own observation on its processor. Previous work including [4] shows that this idea could work. We want to combine deep reinforcement learning in this project.

In sum, our main goal of this research project is to build a multi-robot planner, combining local and global observation, deviding decision into two phases, and using deep reinforcement learning. We want to build our system on top of previous work [1], and demonstrate its planning result for multi-robot system on different scenarios.

At the end of the semester, we want to compare with state-of-art methods in two possible directions:

  1. Handling more complicated system with more robots involved;
  2. Generating a faster global plan for each robot reaching its own goal.

2. Motivation

While humans and animals most of time navigate their lives based on local information, which is obtained through their natural perceptions, the underlying planning process may not yield the optimal result. For example, traffic throughput rarely reaches its theoretical maximum because people tend to change their speed based on local observations and thus causing traffic waves that lead to traffic congestion, as shown in the following video.

Therefore, we want to investigate, in a multi-robot system, how the overall performance of navigation measured in time could benefit from the extra input of global information.

3. Related work

Narain et al. [4] present a hybrid representation for large dense crowds with discrete agents using a dual representation both as discrete agents and as a single continuous system. The scalable crowd simulation approach makes it possible to simulate very large, dense crowds at near-interactive rates on desktop.

Fan et al. [1] develop a fully decentralized multi-robot collision avoidance framework, where each robot makes navigation decisions independently without any communication with others. A sensor-level collision avoidance policy that maps raw sensor measurements to an agent’s steering commands in terms of the movement velocity is learned via reinforcement learning. The learned policy is combined with traditional control approaches to further improve policy’s robustness and effectiveness.

Khan et al. [6] cast the multi-agent reinforcement learning problem as a distributed optimization problem, assuming that for multi-agent settings, policies of individual agents in a given population live close to each other in parameter space and can be approximated by a single policy. Yoon et al. [7] extend the framework of centralized training with decentralized execution to include additional optimization of the inter-agent communication. Sadhu et al. [8] propose consensus-based multi-agent Q-learning to address the bottleneck of the optimal equilibrium selection among multiple types. Kim et al. [9] propose a new training method named message-dropout, yielding efficient learning for multi-agent deep reinforcement learning with information exchange with large input dimensions. Mirowski et al. [10] presents an end-to-end deep reinforcement learning approach that can be applied to real world visual navigation through city-scale environments.

4. Plan


  1. Early March: Review of more previous work related to this project
  2. Mid March - Late March: Reproduce some related work, e.g. [1];
    Create simulator for this project
  3. Early Apri - Mid April: Develop our system; train on simulator
  4. Late April - Early May: Test our final system; write report


[1] Tingxiang Fan, Pinxin Long, Wenxi Liu, Jia Pan. Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios. arXiv, 2018
[2] Pinxin Long, Tingxiang Fan, Xinyi Liao, Wenxi Liu, Hao Zhang, Jia Pan. Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning. International Conference on Robotics and Automation (ICRA), 2018.
[3] Pinxin Long, Wenxi Liu, Jia Pan. Deep-Learned Collision Avoidance Policy for Distributed Multi-Agent Navigation. IEEE Robotics and Automation Letters, 2(2), 656-663, 2017.
[4] Rahul Narain, Abhinav Golas, Sean Curtis, Ming C. Lin. Aggregate Dynamics for Dense Crowd Simulation. SIGGRAPH Asia 2009
[5] Rios-Torres, Jackeline, and Andreas A. Malikopoulos. “A survey on the coordination of connected and automated vehicles at intersections and merging at highway on-ramps.” IEEE Transactions on Intelligent Transportation Systems 18.5 (2017): 1066-1077.
[6] Khan, Arbaaz, et al. “Scalable Centralized Deep Multi-Agent Reinforcement Learning via Policy Gradients.” arXiv preprint arXiv:1805.08776 (2018).
[7] Yoon, Hyung-Jin, et al. “Learning to Communicate: A Machine Learning Framework for Heterogeneous Multi-Agent Robotic Systems.” AIAA Scitech 2019 Forum. 2019.
[8] Sadhu, Arup Kumar, et al. “Multi-robot cooperative planning by consensus Q-learning.” 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017.
[9] Kim, Woojun, Myungsik Cho, and Youngchul Sung. “Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning.” arXiv preprint arXiv:1902.06527 (2019).
[10] Mirowski, Piotr, et al. “Learning to navigate in cities without a map.” Advances in Neural Information Processing Systems. 2018.

Qingyang Tan
Ph.D student

My research interests include computer graphics, computer vision and machine learning.