Publication Detail

A Deep On-Policy Learning Agent for Traffic Signal Control of Multiple Intersections

UCD-ITS-RP-20-65

Conference Paper

Suggested Citation:
Yen, Chia-Cheng, Dipak Ghosal, Michael Zhang, Chen-Nee Chuah (2020) A Deep On-Policy Learning Agent for Traffic Signal Control of Multiple Intersections. 2020 IEEE 23rd Converence on Intelligent Transportation Systems

Reinforcement Learning (RL) is being rapidly adopted in many complex environments due to its ability to leverage neural networks to learn good strategies. In traffic signal control (TSC), existing work has focused on off-policy learning (Q-learning) with neural networks. There is limited study on on-policy learning (SARSA) with neural networks. In this work, we propose a deep dueling on-policy learning method (2DSARSA) for coordinated TSC for a network of intersections that maximizes the network throughput and minimizes the average end-to-end delay. To describe the states of the environment, we propose traffic flow maps (TFMs) that capture head-of-the-line (HOL) sojourn times for traffic lanes and HOL differences for adjacent intersections. We introduce a reward function defined by the power metric which is the ratio of the network throughput to the average end-to-end delay. The proposed reward function simultaneously maximizes the network throughput and minimizes the average end-to-end delay. We show that the proposed 2DSARSA architecture has a significantly better learning performance compared to other RL architectures including Deep Q-Network (DQN) and Deep SARSA (DSARSA).