Abstract

With a growing number of intelligent transportation system sensors and the networkwide deployment of those across the nation’s roadway facilities, current research and practices should concentrate on more proactive safety strategies. In recent years, real-time traffic data collected from ITS sensors have been utilized to develop crash prediction models. Real-time crash prediction models can be used to identify hazardous traffic conditions that might cause a crash. This study aims to examine how employing data mining techniques that account for imbalanced data could improve the predictive capability of real-time crash prediction models. The term imbalanced data refers to a condition where the number of observations in each class is not equally distributed among the data set (noncrash cases outnumber crash cases). To decrease the within-class variation of imbalanced data, the data were split into two traffic-state data sets: free-flow speed (FFS) and congestion. Three models, including logistic regression as the baseline, random forest (RF) with random undersampling, and Adaptive Boosting (AdaBoost), were estimated with each data set. The results were compared with the models that were estimated using the complete set of data. Model comparisons indicated that all three models achieved significantly better predictive results with the congested and FFS data sets as opposed to the data set containing all crashes and that, while in some cases the results of the undersampled RF model were slightly better than those of AdaBoost, both models outperformed the logistic regression model. The results of this study demonstrated that using models to deal with imbalanced data and lowering the variation of imbalanced data could substantially improve crash prediction accuracy. The findings could help traffic agencies to practically implement and deploy crash prediction models for real-time applications and develop crash prevention strategies accordingly.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

All data, models, or code generated or used during the study are confidential in nature. All these items are part of a project with the Arizona DOT, so they are not allowed to be shared.

Acknowledgments

The authors would like to thank the Arizona DOT for funding and data support. Special thanks go to Vahid Goftar and Brent Cain for their support of innovative research. The authors wish to extend their thanks to Mr. Adrian Cottam for valuable comments and proofreading.

References

Abdel-Aty, M., A. Pande, A. Das, and W. Knibbe. 2008. “Assessing safety on dutch freeways with data from infrastructure-based intelligent transportation systems.” Transp. Res. Rec. 2083 (1): 153–161. https://doi.org/10.3141/2083-18.
Abdel-Aty, M., N. Uddin, and A. Pande. 2005. “Split models for predicting multivehicle crashes during high-speed and low-speed operating conditions on freeways.” Transp. Res. Rec. 1908 (1): 51–58. https://doi.org/10.1177/0361198105190800107.
Abdel-Aty, M., N. Uddin, A. Pande, F. Abdalla, and L. Hsia. 2004. “Predicting freeway crashes from loop detector data by matched case-control logistic regression.” Transp. Res. Rec. 1897 (1): 88–95. https://doi.org/10.3141/1897-12.
Abdel-Aty, M. A., and R. Pemmanaboina. 2006. “Calibrating a real-time traffic crash-prediction model using archived weather and ITS traffic data.” IEEE Trans. Intell. Transp. Syst. 7 (2): 167–174. https://doi.org/10.1109/TITS.2006.874710.
Ahmed, M., M. Abdel-Aty, and R. Yu. 2012a. “Assessment of interaction of crash occurrence, mountainous freeway geometry, real-time weather, and traffic data.” Transp. Res. Rec. 2280 (1): 51–59. https://doi.org/10.3141/2280-06.
Ahmed, M., M. Abdel-Aty, and R. Yu. 2012b. “Bayesian updating approach for real-time safety evaluation with automatic vehicle identification data.” Transp. Res. Rec. 2280 (1): 60–67. https://doi.org/10.3141/2280-07.
Ahmed, M. M., and M. A. Abdel-Aty. 2012. “The viability of using automatic vehicle identification data for real-time crash prediction.” IEEE Trans. Intell. Transp. Syst. 13 (2): 459–468. https://doi.org/10.1109/TITS.2011.2171052.
Akbani, R., S. Kwek, and N. Japkowicz. 2004. “Applying support vector machines to imbalanced datasets.” In Proc., European Conf. on Machine Learning, 39–50. Berlin: Springer.
Ariannezhad, A., A. Karimpour, and Y.-J. Wu. 2020. “Incorporating mode choices into safety analysis at the macroscopic level.” J. Transp. Eng., Part A: Syst. 146 (4): 04020022. https://doi.org/10.1061/JTEPBS.0000337.
Ariannezhad, A., H. Razi-Ardakani, and M. Kermanshah. 2014. “Exploring factors contributing to crash severity of motorcycles at suburban roads.” In Proc., 93rd Annual Meeting of the Transportation Research Board. Washington, DC: Transportation Research Board.
Ariannezhad, A., and Y.-J. Wu. 2018. “Effects of heavy rainfall in different light conditions on crash severity during Arizona’s monsoon season.” J. Transp. Saf. Secur. 11 (6): 579–594. https://doi.org/10.1080/19439962.2018.1454561.
Ariannezhad, A., and Y.-J. Wu. 2020. “Large-scale loop detector troubleshooting using clustering and association rule mining.” J. Transp. Eng., Part A: Syst. 146 (7): 04020064. https://doi.org/10.1061/JTEPBS.0000387.
Basso, F., L. J. Basso, F. Bravo, and R. Pezoa. 2018. “Real-time crash prediction in an urban expressway using disaggregated data.” Transp. Res. Part C: Emerging Technol. 86 (Jul): 202–219. https://doi.org/10.1016/j.trc.2017.11.014.
Bradley, A. P. 1997. “The use of the area under the ROC curve in the evaluation of machine learning algorithms.” Pattern Recognit. 30 (7): 1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2.
Breiman, L. 2001. “Random forests.” Mach. Learn. 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Chang, G.-L., and H. Xiang. 2003. The relationship between congestion levels and accidents. College Park, MD: Univ. of Maryland.
Chen, Z., X. Qin, and M. R. R. Shaon. 2017. “Modeling lane-change-related crashes with lane-specific real-time traffic and weather data.” J. Intell. Transp. Syst. 22 (4): 291300. https://doi.org/10.1080/15472450.2017.1309529.
Eftekharzadeh, S. F., and A. Khodabakhshi. 2014. “Safety evaluation of highway geometric design criteria in horizontal curves at downgrades.” Int. J. Civ. Eng. 12 (3): 326–332.
Freund, Y., and R. E. Schapire. 1995. A desicion-theoretic generalization of on-line learning and an application to boosting, 23–37. Berlin: Springer.
Galar, M., A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera. 2012. “A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches.” IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42 (4): 463–484. https://doi.org/10.1109/TSMCC.2011.2161285.
Hassan, H. M., and M. A. Abdel-Aty. 2013. “Predicting reduced visibility related crashes on freeways using real-time traffic flow data.” J. Saf. Res. 45 (Jun): 29–36. https://doi.org/10.1016/j.jsr.2012.12.004.
Hossain, M., and Y. Muromachi. 2012. “A Bayesian network based framework for real-time crash prediction on the basic freeway segments of urban expressways.” Accid. Anal. Prev. 45 (Mar): 373–381. https://doi.org/10.1016/j.aap.2011.08.004.
Janitza, S., C. Strobl, and A.-L. Boulesteix. 2013. “An AUC-based permutation variable importance measure for random forests.” BMC Bioinf. 14 (1): 119. https://doi.org/10.1186/1471-2105-14-119.
Karimpour, A., A. Ariannezhad, and Y.-J. Wu. 2019. “Hybrid data-driven approach for truck travel time imputation.” IET Intel. Transport Syst. 13 (10): 1518–1524. https://doi.org/10.1049/iet-its.2018.5469.
Lee, C., M. Abdel-Aty, and L. Hsia. 2006. “Potential real-time indicators of sideswipe crashes on freeways.” Transp. Res. Rec. 1953 (1): 41–49. https://doi.org/10.1177/0361198106195300105.
Lee, J., S. Yasmin, N. Eluru, M. Abdel-Aty, and Q. Cai. 2018. “Analysis of crash proportion by vehicle type at traffic analysis zone level: A mixed fractional split multinomial logit modeling approach with spatial effects.” Accid. Anal. Prev. 111 (Feb): 12–22. https://doi.org/10.1016/j.aap.2017.11.017.
Liaw, A., and M. Wiener. 2002. “Classification and regression by random forest.” R News 2 (3): 18–22.
Lin, L., Q. Wang, and A. W. Sadek. 2015. “A novel variable selection method based on frequent pattern tree for real-time traffic accident risk prediction.” Transp. Res. Part C: Emerging Technol. 55 (Jun): 444–459. https://doi.org/10.1016/j.trc.2015.03.015.
Liu, M., and Y. Chen. 2017. “Predicting real-time crash risk for urban expressways in China.” Math. Probl. Eng. 2017: 1–10. https://doi.org/10.1155/2017/6263726.
Liu, Y., A. An, and X. Huang. 2006. “Boosting prediction accuracy on imbalanced datasets with SVM ensembles.” In Advances in knowledge discovery and data mining, edited by W.-K. Ng, M. Kitsuregawa, J. Li, and K. Chang, 107–118. Berlin: Springer.
Mansourkhaki, A., A. Karimpour, and H. Sadoghi Yazdi. 2017a. “Non-stationary concept of accident prediction.” In Vol. 170 of Proc., Institution of Civil Engineers-Transport, 140–151. London: Thomas Telford. https://doi.org/10.1680/jtran.15.00053.
Mansourkhaki, A., A. Karimpour, and H. S. Yazdi. 2017b. “Introducing prior knowledge for a hybrid accident prediction model.” KSCE J. Civ. Eng. 21 (5): 1912–1918.
Mease, D., A. J. Wyner, and A. Buja. 2007. “Boosted classification trees and class probability/quantile estimation.” J. Mach. Learn. Res. 8 (Mar): 409–439.
Mussone, L., A. Ferrari, and M. Oneta. 1999. “An analysis of urban collisions using an artificial intelligence model.” Accid. Anal. Prev. 31 (6): 705–718. https://doi.org/10.1016/S0001-4575(99)00031-7.
Oh, C., J.-S. Oh, and S. G. Ritchie. 2005. “Real-time hazardous traffic condition warning system: Framework and evaluation.” IEEE Trans. Intell. Transp. Syst. 6 (3): 265–272. https://doi.org/10.1109/TITS.2005.853693.
Pande, A., and M. Abdel-Aty. 2006a. “Assessment of freeway traffic parameters leading to lane-change related collisions.” Accid. Anal. Prev. 38 (5): 936–948. https://doi.org/10.1016/j.aap.2006.03.004.
Pande, A., and M. Abdel-Aty. 2006b. “Comprehensive analysis of the relationship between real-time traffic surveillance data and rear-end crashes on freeways.” Transp. Res. Rec. 1953 (1): 31–40. https://doi.org/10.1177/0361198106195300104.
Pande, A., A. Das, M. Abdel-Aty, and H. Hassan. 2011. “Estimation of real-time crash risk.” Transp. Res. Rec. 2237 (1): 60–66. https://doi.org/10.3141/2237-07.
Parsa, A. B., H. Taghipour, S. Derrible, and A. Mohammadian. 2019. “Real-time accident detection: Coping with imbalanced data.” Accid. Anal. Prev. 129 (Aug): 202–210. https://doi.org/10.1016/j.aap.2019.05.014.
Razi-Ardakani, H., A. Ariannezhad, and M. Vaziri. 2014. “Identifying factors affecting severity of urban and rural bus crashes.” In Proc., 93rd Annual Meeting of the Transportation Research Board. Washington, DC: Transportation Research Board.
Roshandel, S., Z. Zheng, and S. Washington. 2015. “Impact of real-time traffic characteristics on freeway crash occurrence: Systematic review and meta-analysis.” Accid. Anal. Prev. 79 (Jun): 198–211. https://doi.org/10.1016/j.aap.2015.03.013.
Schapire, R. E. 2013. “Explaining AdaBoost.” In Empirical inference, 37–52. Berlin: Springer.
Seiffert, C., T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano. 2010. “RUSBoost: A hybrid approach to alleviating class imbalance.” IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40 (1): 185–197. https://doi.org/10.1109/TSMCA.2009.2029559.
Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn. 2007. “Bias in random forest variable importance measures: Illustrations, sources and a solution.” BMC Bioinf. 8 (1): 25. https://doi.org/10.1186/1471-2105-8-25.
Sun, J., and J. Sun. 2016. “Real-time crash prediction on urban expressways: Identification of key variables and a hybrid support vector machine model.” IET Intell. Transport Syst. 10 (5): 331–337. https://doi.org/10.1049/iet-its.2014.0288.
Tang, B., and H. He. 2017. “GIR-based ensemble sampling approaches for imbalanced learning.” Pattern Recognit. 71 (Nov): 306–319. https://doi.org/10.1016/j.patcog.2017.06.019.
Theofilatos, A., G. Yannis, P. Kopelias, and F. Papadimitriou. 2018. “Impact of real-time traffic characteristics on crash occurrence: Preliminary results of the case of rare events.” Accid. Anal. Prev. 130 (Sep): 151–159. https://doi.org/10.1016/J.AAP.2017.12.018.
Train, K. E. 2009. Discrete choice methods with simulation. Cambridge, UK: Cambridge University Press.
Uddin, M., and N. Huynh. 2017. “Truck-involved crashes injury severity analysis for different lighting conditions on rural and urban roadways.” Accid. Anal. Prev. 108 (Nov): 44–55. https://doi.org/10.1016/j.aap.2017.08.009.
Weiss, G. M. 2004. “Mining with rarity: A unifying framework.” ACM SIGKDD Explor. Newsl. 6 (1): 7. https://doi.org/10.1145/1007730.1007734.
Wu, X., et al. 2008. “Top 10 algorithms in data mining.” Knowl. Inf. Syst. 14 (1): 1–37. https://doi.org/10.1007/s10115-007-0114-2.
Xu, C., P. Liu, W. Wang, and Z. Li. 2012. “Evaluation of the impacts of traffic states on crash risks on freeways.” Accid. Anal. Prev. 47 (Jul): 162–171. https://doi.org/10.1016/j.aap.2012.01.020.
Xu, C., A. P. Tarko, W. Wang, and P. Liu. 2013a. “Predicting crash likelihood and severity on freeways with real-time loop detector data.” Accid. Anal. Prev. 57 (Aug): 30–39. https://doi.org/10.1016/j.aap.2013.03.035.
Xu, C., W. Wang, and P. Liu. 2013b. “A genetic programming model for real-time crash prediction on freeways.” IEEE Trans. Intell. Transp. Syst. 14 (2): 574–586. https://doi.org/10.1109/TITS.2012.2226240.
Xu, C., W. Wang, P. Liu, R. Guo, and Z. Li. 2014. “Using the Bayesian updating approach to improve the spatial and temporal transferability of real-time crash risk prediction models.” Transp. Res. Part C: Emerging Technol. 38 (Jan): 167–176. https://doi.org/10.1016/j.trc.2013.11.020.
You, J., J. Wang, and J. Guo. 2017. “Real-time crash prediction on freeways using data mining and emerging techniques.” J. Mod. Transp. 25 (2): 116–123. https://doi.org/10.1007/s40534-017-0129-7.
Youden, W. J. 1950. “Index for rating diagnostic tests.” Cancer 3 (1): 32–35. https://doi.org/10.1002/1097-0142(1950)3:1%3C32::AID-CNCR2820030106%3E3.0.CO;2-3.
Yu, R., and M. Abdel-Aty. 2013. “Utilizing support vector machine in real-time crash risk evaluation.” Accid. Anal. Prev. 51 (Mar): 252–259. https://doi.org/10.1016/j.aap.2012.11.027.
Yu, R., M. Quddus, X. Wang, and K. Yang. 2018. “Impact of data aggregation approaches on the relationships between operating speed and traffic safety.” Accid. Anal. Prev. 120 (Nov): 304–310. https://doi.org/10.1016/j.aap.2018.06.007.

Information & Authors

Information

Published In

Go to Journal of Transportation Engineering, Part A: Systems
Journal of Transportation Engineering, Part A: Systems
Volume 147Issue 3March 2021

History

Received: Oct 25, 2019
Accepted: Oct 27, 2020
Published online: Dec 29, 2020
Published in print: Mar 1, 2021
Discussion open until: May 29, 2021

Permissions

Request permissions for this article.

Authors

Affiliations

Research Assistant, Dept. of Civil and Architectural Engineering and Mechanics, Univ. of Arizona, Tucson, AZ 85721 (corresponding author). ORCID: https://orcid.org/0000-0001-6679-7428. Email: [email protected]
Graduate Research Assistant, Dept. of Civil and Architectural Engineering and Mechanics, Univ. of Arizona, Tucson, AZ 85721. ORCID: https://orcid.org/0000-0002-8707-6408. Email: [email protected]
Xiao Qin, Ph.D. [email protected]
P.E.
Professor, Dept. of Civil and Environmental Engineering, Univ. of Wisconsin-Milwaukee, Milwaukee, WI 53211. Email: [email protected]
Associate Professor, Dept. of Civil and Architectural Engineering and Mechanics, Univ. of Arizona, Tucson, AZ 85721. ORCID: https://orcid.org/0000-0002-0456-7915. Email: [email protected]
Assistant Professor of Project and Operations Management, Dept. of Management, College of Business, Bryant Univ., Smithfield, RI 02917. ORCID: https://orcid.org/0000-0002-7691-7240. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share