Evaluating the Nonlinear Correlation between Vertical Curve Features and Crash Frequency on Highways Using Random Forests
Publication: Journal of Transportation Engineering, Part A: Systems
Volume 146, Issue 10
Abstract
Vertical curve features on interstate highways greatly affect traffic operations and vehicle performance and, thus, could have an impact on the occurrence of traffic crashes. Most studies to date only considered linear relationships. Though some researchers did consider nonlinearity, the preassumed data distribution may not fit the true distribution perfectly. Thus, the primary objective of this study is to develop a nonparametric algorithm to evaluate the nonlinear correlation between vertical curve features and crash frequency on interstate highways based on a random forest (RF) algorithm. Elevation data along interstate centerlines were extracted from Google Earth for two interstates in Washington State, and 5-year crash data were collected to estimate RF models for crash count prediction. A random effect negative binomial (RENB) model is employed to evaluate predictive performance. Analysis of the variables’ importance shows that the proposed RF models captured the nonlinear correlation between crash count and annual average daily traffic (AADT), the elevation and grade of road segments, median lane width, left shoulder width, ratio of horizontal curve, the standard deviation of grade in 1- and 2-mi road segments, the standard deviation of elevation in 1- and 2-mi road segments, and lane width. Other variables, e.g., right shoulder width and the number of lanes on the highway were also important in the proposed RF models. By better capturing the nonlinearity, the proposed RF model outperformed the baseline model in terms of the predictive performance measurements. The findings of this research can serve to facilitate improvements in highway geometric design and recommend countermeasures to reduce the crash count on interstate highways.
Get full access to this article
View all available purchase options and get full access to this article.
Data Availability Statement
Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request. The crash data and the road elevation data used in the current research are available upon reasonable request.
Acknowledgments
This research was supported in part by the Center for Safety Equity in Transportation (CSET) project numbered #1905, US Department of Transportation University Transportation Center for Tier 1. The authors also thank Dr. Yinsong Wang, Xianzhe Chen, and Ben Wright for help with data extraction and reduction and Chris Gottsacker for language editing.
References
Abdel-Aty, M., and H. Abdelwahab. 2004. “Modeling rear-end collisions including the role of driver’s visibility and light truck vehicles using a nested logit structure.” Accid. Anal. Prev. 36 (3): 447–456. https://doi.org/10.1016/S0001-4575(03)00040-X.
Al-Deek, H. M., S. S. Ishak, and A. A. Khan. 1996. “Impact of freeway geometric and incident characteristics on incident detection.” J. Transp. Eng. 122 (6): 440–446. https://doi.org/10.1061/(ASCE)0733-947X(1996)122:6(440).
Boriboonsomsin, K., and M. Barth. 2009. “Impacts of road grade on fuel consumption and carbon dioxide emissions evidenced by use of advanced navigation systems.” Transp. Res. Rec. 2139 (1): 21–30. https://doi.org/10.3141/2139-03.
Breiman, L. 2001. “Random forests.” Mach. Learn. 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Breiman, L. 2002. Manual on setting up, using, and understanding random forests v3.1. Berkeley, CA: Statistics Dept., Univ. of California.
Breiman, L. 2017. Classification and regression trees. London: Routledge.
Cicero-Fernández, P., J. R. Long, and A. M. Winer. 1997. “Effects of grades and other loads on on-road emissions of hydrocarbons and carbon monoxide.” J. Air Waste Manage. Assoc. 47 (8): 898–904. https://doi.org/10.1080/10473289.1997.10464455.
Díaz-Uriarte, R., and S. A. De Andres. 2006. “Gene selection and classification of microarray data using random forest.” BMC Bioinf. 7 (1): 3. https://doi.org/10.1186/1471-2105-7-3.
Dong, N., H. Huang, and L. Zheng. 2015. “Support vector machine in crash prediction at the level of traffic analysis zones: Assessing the spatial proximity effects.” Accid. Anal. Prev. 82 (Sep): 192–198. https://doi.org/10.1016/j.aap.2015.05.018.
Drucker, H., C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik. 1997. “Support vector regression machines.” In Advances in neural information processing systems, edited by M. C. Mozer, M. I. Jordan, and T. Petsche, 155–161. Cambridge, MA: MIT Press.
Emmerink, R. H. M., K. W. Axhausen, P. Nijkamp, and P. Rietveld. 1995. “Effects of information in road transport networks with recurrent congestion.” Transportation 22 (1): 21–53. https://doi.org/10.1007/BF01151617.
Farr, T. G., et al. 2007. “The shuttle radar topography mission.” Rev. Geophys. 45 (2): 1–33. https://doi.org/10.1029/2005RG000183.
Garber, N. J., and A. A. Ehrhart. 2000. “Effect of speed, flow, and geometric characteristics on crash frequency for two-lane highways.” Transp. Res. Rec. 1717 (1): 76–83. https://doi.org/10.3141/1717-10.
Geedipally, S. R., D. Lord, and S. S. Dhavala. 2012. “The negative binomial-Lindley generalized linear model: Characteristics and application using crash data.” Accid. Anal. Prev. 45 (Mar): 258–265. https://doi.org/10.1016/j.aap.2011.07.012.
Genuer, R., J.-M. Poggi, and C. Tuleau-Malot. 2010. “Variable selection using random forests.” Pattern Recognit. Lett. 31 (14): 2225–2236. https://doi.org/10.1016/j.patrec.2010.03.014.
Gesch, D., M. Oimoen, S. Greenlee, C. Nelson, M. Steuck, and D. Tyler. 2002. “The national elevation dataset.” Photogramm. Eng. Remote Sens. 68 (1): 5–32.
Grömping, U. 2009. “Variable importance assessment in regression: Linear regression versus random forest.” Am. Statistician 63 (4): 308–319. https://doi.org/10.1198/tast.2009.08199.
Hassel, D., and F.-J. Weber. 1997. Gradient influence on emission and consumption behaviour of light and heavy duty vehicles. Cologne, Germany: TÜV Rheinland.
Hilbe, J. M. 2011. Negative binomial regression. Cambridge, UK: Cambridge University Press.
Ihaka, R., and R. Gentleman. 1996. “R: A language for data analysis and graphics.” J. Comput. Graphical Stat. 5 (3): 299–314. https://doi.org/10.1080/10618600.1996.10474713.
Ishwaran, H. 2007. “Variable importance in binary regression trees and forests.” Electron. J. Stat. 1: 519–537. https://doi.org/10.1214/07-EJS039.
Lao, Y., G. Zhang, Y. Wang, and J. Milton. 2014. “Generalized nonlinear models for rear-end crash risk analysis.” Accid. Anal. Prev. 62 (Jan): 9–16. https://doi.org/10.1016/j.aap.2013.09.004.
Levin, M. W., M. Duell, and S. T. Waller. 2014. “Effect of road grade on networkwide vehicle energy consumption and ecorouting.” Transp. Res. Rec. 2427 (1): 26–33. https://doi.org/10.3141/2427-03.
Li, X., D. Lord, Y. Zhang, and Y. Xie. 2008. “Predicting motor vehicle crashes using support vector machine models.” Accid. Anal. Prev. 40 (4): 1611–1618. https://doi.org/10.1016/j.aap.2008.04.010.
Li, Z., Z. Pu, Y. Wang, W. Zhu, Z. Chen, and H. Wu. 2017. Evaluating the correlation between vertical curve features and crash rates on highways. Washington, DC: Transportation Research Board.
Liaw, A., and M. Wiener. 2002. “Classification and regression by randomForest.” R News 2 (3): 18–22.
Mannering, F. L., V. Shankar, and C. R. Bhat. 2016. “Unobserved heterogeneity and the statistical analysis of highway accident data.” Analytic Methods Accid. Res. 11 (Sep): 1–16. https://doi.org/10.1016/j.amar.2016.04.001.
Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized linear models.” J. R. Stat. Soc., Ser. A 135 (3): 370–384. https://doi.org/10.2307/2344614.
Oliveira, S., F. Oehler, J. San-Miguel-Ayanz, A. Camia, and J. M. C. Pereira. 2012. “Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random forest.” For. Ecol. Manage. 275 (Jul): 117–129. https://doi.org/10.1016/j.foreco.2012.03.003.
Ou, J., J. Xia, Y.-J. Wu, and W. Rao. 2017. “Short-term traffic flow forecasting for urban roads using data-driven feature selection strategy and bias-corrected random forests.” Transp. Res. Rec. 2645 (1): 157–167. https://doi.org/10.3141/2645-17.
Reutebuch, S. E., H.-E. Andersen, and R. J. McGaughey. 2005. “Light detection and ranging (LIDAR): An emerging tool for multiple resource inventory.” J. For. 103 (6): 286–292.
Rusli, R., M. M. Haque, A. P. Afghari, and M. King. 2018. “Applying a random parameters negative binomial Lindley model to examine multi-vehicle crashes along rural mountainous highways in Malaysia.” Accid. Anal. Prev. 119 (Oct): 80–90. https://doi.org/10.1016/j.aap.2018.07.006.
Shankar, V., F. Mannering, and W. Barfield. 1995. “Effect of roadway geometrics and environmental factors on rural freeway accident frequencies.” Accid. Anal. Prev. 27 (3): 371–389. https://doi.org/10.1016/0001-4575(94)00078-Z.
Skabardonis, A., P. Varaiya, and K. F. Petty. 2003. “Measuring recurrent and nonrecurrent traffic congestion.” Transp. Res. Rec. 1856 (1): 118–124. https://doi.org/10.3141/1856-12.
Specht, D. F. 1991. “A general regression neural network.” IEEE Trans. Neural Networks 2 (6): 568–576. https://doi.org/10.1109/72.97934.
Strobl, C., A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis. 2008. “Conditional variable importance for random forests.” BMC Bioinf. 9 (1): 307. https://doi.org/10.1186/1471-2105-9-307.
Tachikawa, T., et al. 2011. ASTER global digital elevation model version 2—Summary of validation results. Washington, DC: National Aeronautics and Space Administration.
USGS. 1996. “Arc-second elevation (GTOPO30).” Accessed July 20, 2020. https://www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-global-30-arc-second-elevation-gtopo30?qt-science_center_objects=0#qt-science_center_objects.
Wang, Y., Y. Zou, K. Henrickson, Y. Wang, J. Tang, and B.-J. Park. 2017. “Google Earth elevation data extraction and accuracy assessment for transportation applications.” PLoS One 12 (4): e0175756. https://doi.org/10.1371/journal.pone.0175756.
Washington, S. P., M. G. Karlaftis, and F. Mannering. 2010. Statistical and econometric methods for transportation data analysis. Boca Raton, FL: CRC Press.
Wong, S. C., N.-N. Sze, and Y.-C. Li. 2007. “Contributory factors to traffic crashes at signalized intersections in Hong Kong.” Accid. Anal. Prev. 39 (6): 1107–1113. https://doi.org/10.1016/j.aap.2007.02.009.
Zhu, W., B. Wright, Z. Li, Y. Wang, and Z. Pu. 2016. Analyzing the impact of grade on fuel consumption for the national interstate highway system. Washington, DC: Transportation Research Board.
Information & Authors
Information
Published In
Copyright
© 2020 American Society of Civil Engineers.
History
Received: Feb 7, 2019
Accepted: Apr 7, 2020
Published online: Jul 24, 2020
Published in print: Oct 1, 2020
Discussion open until: Dec 24, 2020
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.