Technical Papers
Nov 21, 2022

Multiagent Soft Actor–Critic for Traffic Light Timing

Publication: Journal of Transportation Engineering, Part A: Systems
Volume 149, Issue 2

Abstract

Deep reinforcement learning has strong perception and decision-making capabilities that can effectively solve the problem of continuous high-dimensional state-action space and has become the mainstream method in the field of traffic light timing. However, due to model structural defects or different strategic mechanisms of models, most deep reinforcement learning models have problems such as convergence and divergence or poor exploration capabilities. Therefore, this paper proposes a multi-agent Soft Actor–Critic (SAC) for traffic light timing. Multi-agent SAC adds an entropy item to measure the randomness of the strategy in the objective function of traditional reinforcement learning and maximizes the sum of expected reward and entropy item to improve the model’s exploration ability. The system model can learn multiple optimal timing schemes, avoid repeated selection of the same optimal timing scheme and fall into a local optimum or fail to converge. Meanwhile, it abandons low reward value strategies to reduce data storage and sampling complexity, accelerate training, and improve the stability of the system. Comparative experiments show that the method based on multi-agent SAC traffic light timing can solve the existing problems of deep reinforcement learning and improve the efficiency of vehicles passing through in different traffic scenarios.

Practical Applications

This paper is devoted to research on the timing method of traffic lights at multiple intersections. The experimental results show that the method proposed in this paper can effectively improve the throughput of each intersection, the waiting time of vehicles, and the number of queued vehicles. In comparison with related algorithms, it is fully proven that the method proposed in this paper can effectively solve the problems pervasive in existing algorithms. In the actual application process, the traffic status information is obtained via interactions with real traffic environment, and the timing scheme of the traffic lights is dynamically adjusted according to the information, so as to achieve the effect of alleviating traffic congestion at the intersection.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, (Lan Wu), upon reasonable request.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant 61973103, Zhengzhou Science and Technology Bureau Natural Science Project under Grant 21ZZXTCX01, and Outstanding Youth Project of Natural Science Foundation of Henan Province under Grant 222300420039.

References

Arel, I., C. Liu, T. Urbanik, and A. G. Kohls. 2010. “Reinforcement learning-based Multi-agent system for network traffic signal control.” Intell. Transp. Syst. IET 4 (2): 128–135. https://doi.org/10.1049/iet-its.2009.0070.
Arulkumaran, K., M. P. Deisenroth, M. Brundage, and A. A. Bharath. 2017. “A brief survey of deep reinforcement learning.” IEEE Signal Process Mag. 34 (6): 2–11. https://doi.org/10.1109/MSP.2017.2743240.
Bazzan, A. 2009. “Opportunities for multi-agent systems and multi-agent reinforcement learning in traffic control.” Auton. Agents Multi-Agent Syst. 18 (3): 342. https://doi.org/10.1007/s10458-008-9062-9.
Bingham, E. 2001. “Reinforcement learning in neurofuzzy traffic signal control.” Eur. J. Oper. Res. 131 (2): 232–241. https://doi.org/10.1016/S0377-2217(00)00123-5.
Cai, P., X. Mei, L. Tai, Y. Sun, and M. Liu. 2020. “High-speed autonomous drifting with deep reinforcement learning.” IEEE Rob. Autom. Lett. 5 (2): 1247–1254. https://doi.org/10.1109/LRA.2020.2967299.
Chu, T., J. Wang, L. Codeca, and Z. J. Li. 2019. “Multi-agent deep reinforcement learning for large-scale traffic signal control.” IEEE Trans. Intell. Transp. Syst. 21 (3): 1–10. https://doi.org/10.1109/TITS.2019.2901791.
Duan, J., Y. Guan, S. E. Li, Y. Ren, and B. Cheng. 2021. “Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors.” IEEE Trans. Neural Networks Learn. Syst. (Jun): 1–13. https://doi.org/10.1109/TNNLS.2021.3082568.
Gao, R., Z. Liu, J. Li, and Q. Yuan. 2020. “Cooperative traffic signal control based on multi-agent reinforcement learning.” In International conference on blockchain and trustworthy systems, 787–793. Singapore: Springer.
Jesus, J., V. A. Kich, A. H. Kolling, R. B. Grando, and D. F. T. Gamarra. 2021. “Soft actor-critic for navigation of mobile robots.” J. Intell. Rob. Syst. 102 (2): 31. https://doi.org/10.1007/s10846-021-01367-5.
Kim, D., and O. Jeong. 2020. “Cooperative traffic signal control with traffic flow prediction in multi-intersection.” Sensors 20 (1): 137. https://doi.org/10.3390/s20010137.
Li, Z., H. Yu, G. Zhang, S. Dong, and C. Z. Xu. 2021. “Network-wide traffic signal control optimization using a Multi-agent deep reinforcement learning.” Transp. Res. Part C Emerging Technol. 125 (3): 103059. https://doi.org/10.1016/j.trc.2021.103059.
Lillicrap, T. P., J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. 2015. “Continuous control with deep reinforcement learning.” Preprint, submitted July 5, 2019. https://doi.org/10.48550/arXiv.1509.02971.
Lv, Y., Y. Duan, W. Kang, Z. Li, and F. Y. Wang. 2015. “Traffic flow prediction with big data: A deep learning approach.” IEEE Trans. Intell. Transp. Syst. 16 (2): 865–873. https://doi.org/10.1109/TITS.2014.2345663.
Ma, X., D. Zhuang, Z. He, J. H. Ma, and Y. P. Wang. 2017. “Learning traffic as images: A deep convolutional neural network for large-scale transportation network speed prediction.” Sensors 17 (4): 818. https://doi.org/10.3390/s17040818.
Polson, N. G., and V. O. Sokolov. 2017. “Deep learning for short-term traffic flow prediction.” Transp. Res. Part C Emerging Technol. 79 (Jun): 1–17. https://doi.org/10.1016/j.trc.2017.02.024.
Prashanth, A. L., and S. Bhatnagar. 2011. “Reinforcement learning with function approximation for traffic signal control.” IEEE Trans. Intell. Transp. Syst. 12 (2): 412–421. https://doi.org/10.1109/TITS.2010.2091408.
Rasheed, F., K. Yau, and Y. C. Low. 2020. “Deep reinforcement learning for traffic signal control under disturbances: A case study on Sunway city, Malaysia.” Future Gener. Comput. Syst. 109 (Aug): 431–445. https://doi.org/10.1016/j.future.2020.03.065.
Retting, R. A., J. F. Chapline, and A. F. Williams. 2002. “Changes in crash risk following re-timing of traffic signal change intervals.” Accid. Anal. Prev. 34 (2): 215–220. https://doi.org/10.1016/S0001-4575(01)00016-1.
Rieme, S. S., and G. H. Rosenlund. 2020. Deep reinforcement learning for long term hydropower production scheduling. In Proc., Int. Conf. on Smart Energy Systems and Technologies (SEST), 1–6. New York: IEEE.
Robertson, D. I., and R. D. Bretherton. 1991. “Optimizing networks of traffic signals in real time-the SCOOT method.” IEEE Trans. Veh. Technol. 40 (1): 11–15. https://doi.org/10.1109/25.69966.
Royani, T., J. Haddadnia, and M. Alipoor. 2013. “Control of traffic light in isolated intersections using fuzzy neural network and genetic algorithm.” Int. J. Comput. Electr. Eng. 5 (1): 142–146. https://doi.org/10.7763/IJCEE.2013.V5.682.
Short, M. S., G. A. Woelfl, and C. J. Chang. 1982. “Effects of traffic signal installation on accidents.” Accid. Anal. Prev. 14 (2): 135–145. https://doi.org/10.1016/0001-4575(82)90080-X.
Sims, A. 1980. “The Sydney coordinated adaptive traffic (SCAT) system philosophy and benefits.” IEEE Trans. Veh. Technol. 29 (2): 130–137. https://doi.org/10.1109/T-VT.1980.23833.
Tan, K. L., A. Sharma, and S. Sarkar. 2020. “Robust deep reinforcement learning for traffic signal control.” J. Big Data Anal. Transp. 2 (3): 263–274. https://doi.org/10.1007/s42421-020-00029-6.
Tang, K., Y. Xu, F. Wang, and T. Oguchi. 2016. “Exploring stop-go decision zones at rural high-speed intersections with flashing green signal and insufficient yellow time in China.” Accid. Anal. Prev. 95 (Oct): 470–478. https://doi.org/10.1016/j.aap.2016.01.011.
Turky, A. M., M. S. Ahmad, M. Yusoff, and N. R. Sabar. 2008. “Genetic algorithm application for traffic light control.” In Proc., Information Systems: Modeling, Development, and Integration, 3rd Int. United Information Systems Conf., UNISCON, Sydney, Australia. Berlin: Springer.
Vanderschuren, M. 2008. “Safety improvements through intelligent transport systems: A South African case study based on microscopic simulation modelling.” Accid. Anal. Prev. 40 (2): 807–817. https://doi.org/10.1016/j.aap.2007.09.025.
Volodymyr, M., et al. 2019. “Human-level control through deep reinforcement learning.” Nature 518 (7540): 529–533. https://doi.org/10.1038/nature14236.
Wang, W., N. Yu, Y. Gao, and J. Shi. 2020. “Safe off-policy deep reinforcement learning algorithm for volt-VAR control in power distribution systems.” IEEE Trans. Smart Grid 11 (4): 3008–3018. https://doi.org/10.1109/TSG.2019.2962625.
Wong, C. C., S. Y. Chien, H. M. Feng, and H. Aoyama. 2021. “Motion planning for dual-arm robot based on soft actor-critic.” IEEE Access 9 (Feb): 26871–26885. https://doi.org/10.1109/ACCESS.2021.3056903.
Wu, J., Z. Wei, W. Li, Y. Wang, Y. Li, and D. Sauer. 2021. “Battery thermal- and health-constrained energy management for hybrid electric bus based on soft actor-critic DRL algorithm.” IEEE Trans. Ind. Inf. 17 (6): 3751–3761. https://doi.org/10.1109/TII.2020.3014599.
Wu, T., P. Zhou, K. Liu, Y. Yuan, and D. O. Wu. 2020. “Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks.” IEEE Trans. Veh. Technol. 69 (8): 8243–8256. https://doi.org/10.1109/TVT.2020.2997896.
Yang, H. F., T. S. Dillon, and Y. P. P. Chen. 2016. “Optimized structure of the traffic flow forecasting model with a deep learning approach.” IEEE Trans. Neural Networks Learn. Syst. 28 (10): 2371–2381. https://doi.org/10.1109/TNNLS.2016.2574840.
Yang, S., B. Yang, H. S. Wong, and Z. Kang. 2019. “Cooperative traffic signal control using multi-step return and off-policy asynchronous advantage actor-critic graph algorithm.” Knowl.-Based Syst. 183 (Nov): 104855. https://doi.org/10.1016/j.knosys.2019.07.026.
Zhang, H., S. Feng, C. Liu, Y. Y. Ding, Y. C. Zhu, Z. H. Zhou, W. N. Zhang, Y. Yu, H. M. Jin, and Z. H. Li. 2019. “CityFlow: A multi-agent reinforcement learning environment for large scale city traffic scenario.” Preprint, submitted May 13, 2019. https://doi.org/10.48550/arXiv.1905.05217.
Zhang, Y., Y. Zhou, H. Lu, and H. Fujita. 2021. “Cooperative multi-agent actor-critic control of traffic network flow based on edge computing.” Future Gener. Comput. Syst. 123 (Oct): 128–141. https://doi.org/10.1016/j.future.2021.04.018.

Information & Authors

Information

Published In

Go to Journal of Transportation Engineering, Part A: Systems
Journal of Transportation Engineering, Part A: Systems
Volume 149Issue 2February 2023

History

Received: Jan 14, 2022
Accepted: Aug 8, 2022
Published online: Nov 21, 2022
Published in print: Feb 1, 2023
Discussion open until: Apr 21, 2023

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Professor, Dept. of Electrical Engineering, Henan Univ. of Technology, Zhengzhou 450001, PR China (corresponding author). Email: [email protected]
Yuanming Wu [email protected]
Master’s Student, Dept. of Electrical Engineering, Henan Univ. of Technology, Zhengzhou 450001, PR China. Email: [email protected]
Lecturer, Dept. of International Education, Zhengzhou Railway Vocational & Technical College, Zhengzhou 451460, PR China. Email: [email protected]
Yafang Tian [email protected]
Associate Professor, Dept. of International Education, Zhengzhou Railway Vocational & Technical College, Zhengzhou 451460, PR China. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share