Technical Papers
Oct 3, 2023

Comparative Analysis of Machine Learning and Survival Analysis Combinations for Water Main Failure Prediction

Publication: Journal of Infrastructure Systems
Volume 29, Issue 4

Abstract

Water main failure prediction models are a common tool in support of the asset management plans of utilities. This paper focuses on the development and evaluation of models that incorporate survival analysis in combination with machine learning algorithms. To understand the potential advantages of these models, five representative algorithms are applied to data from four different utilities. The Cox proportional hazards (Cox-PH) algorithm, a common survival analysis technique, is considered to provide a baseline measure of model performance. Other algorithms investigated include: extreme gradient boosting (XGBoost), random survival forest (RSF), neural multitask logistic regression (NMTLR), and XGBoost survival embeddings (XGBSE). XGBoost is a pure machine learning algorithm whereas the other algorithms embed survival analysis within a machine learning framework. Our findings suggest that the models that embed survival analysis within machine learning may predict water main failures more accurately than traditional survival analysis or machine learning models. All models are developed using data sets with high degrees of censorship and varying characteristics; thus, these approaches may be beneficial for a wide range of utilities, including those with limited break data. In particular, models based on the RSF algorithm are found to perform consistently well for the data sets investigated herein.

Practical Applications

To support decisions related to the repair or replacement of water mains, many utilities develop models that predict which water mains are most likely to break. Survival analysis (a branch of statistics) and machine learning models have been frequently applied in previous studies on water main failure prediction. However, when applied on their own, there are limitations to both of these types of approaches. More recently, several studies have focused on the potential for combinations of survival analysis and machine learning to overcome these limitations. In this study, three algorithms that combine machine learning and survival analysis are compared with a common survival analysis method and a machine learning algorithm. Two of the selected algorithms have not previously been applied in water main failure prediction, and this study exhibits their capabilities in this field. The algorithms are applied to data sets from four different utilities. Results from the analysis indicate that combinations of survival analysis and machine learning outperform a traditional survival analysis algorithm and a machine learning algorithm for the data sets included in this study.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

All data, models and code generated and applied for this study are proprietary or confidential in nature. The authors are unable to provide the data, models, or code; however, upon request, the corresponding author could direct readers to the openly available sources for the algorithms implemented in this study.

Acknowledgments

Support for this work was provided by Mitacs and Global Quality Corp. (GQC) through a Mitacs Accelerate Grant (No. IT20809). We thank Jacob Specht and Melissa Conner at GQC for their support in developing and applying the models. We are also grateful to each of the participating utilities for their willingness to provide their data and to collaborate with our research team.

References

ASCE. 2021. 2021 report card for America’s infrastructure. Reston, VA: ASCE.
Asnaashari, A., E. A. McBean, B. Gharabaghi, and D. Tutt. 2013. “Forecasting watermain failure using artificial neural network modelling.” Can. Water Resour. J. 38 (1): 24–33. https://doi.org/10.1080/07011784.2013.774153.
Beaulac, C., J. S. Rosenthal, Q. L. Pei, D. Friedman, S. Wolden, and D. Hodgson. 2020. “An evaluation of machine learning techniques to predict the outcome of children treated for Hodgkin-Lymphoma on the AHOD0031 trial.” Appl. Artif. Intell. 34 (14): 1100–1114. https://doi.org/10.1080/08839514.2020.1815151.
Breiman, L. 2001. “Random forests.” Mach. Learn. 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Campanella, K., C. Andreasen, H. Diba, J. Himmelberger, J. Leighton, J. Santini, and K. Vause. 2016. “2015 establishing the level of progress in utility asset management survey results.” In Vol. 2016 of Proc., Water Environment Federation, 462–490. Alexandria, VA: Water Environment Federation.
Chen, T. Q., and C. Guestrin. 2016. XGBoost: A scalable tree boosting system. New York: Association for Computing Machinery.
Cox, D. R. 1972. “Regression models and life-tables.” J. R. Stat. Soc. Ser. B 34 (2): 187–202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x.
De’Ath, G. 2007. “Boosted trees for ecological modeling and prediction.” Ecology 88 (1): 243–251. https://doi.org/10.1890/0012-9658(2007)88[243:BTFEMA]2.0.CO;2.
Deo, R. C. 2015. “Machine learning in medicine.” Circulation 132 (20): 1920–1930. https://doi.org/10.1161/CIRCULATIONAHA.115.001593.
Farmani, R., K. Kakoudakis, K. Behzadian, and D. Butler. 2017. “Pipe failure prediction in water distribution systems considering static and dynamic factors.” In Proc., XVIII Int. Conf. on Water Distribution Systems, WDSA2016, 117–126. Amsterdam, Netherlands: Elsevier.
Folkman, S. 2018. Water main break rates in the USA and Canada: A comprehensive study. Logan, UT: Mechanical and Aerospace Engineering Faculty Publications.
Fotso, S. 2018. “Deep neural networks for survival analysis based on a multi-task framework.” Preprint, submitted January 17, 2018. https://arxiv.org/abs/1801.05512.
Giunchiglia, E., A. Nemchenko, and M. van der Schaar. 2018. “RNN-surv: A deep recurrent model for survival analysis.” In Proc., Artificial Neural Networks and Machine Learning–ICANN 2018: 27th Int. Conf. on Artificial Neural Networks, edited by V. Kurkova, Y. Manolopoulos, B. Hammer, L. Iliadis, and I. Maglogiannis, 23–32. Cham, Switzerland: Springer.
Grigg, N. S. 2019. “Data and analytics combat water main failures.” J. Am. Water Works Assoc. 111 (5): 35–41. https://doi.org/10.1002/awwa.1288.
Haider, H., B. Hoehn, S. Davis, and R. Greiner. 2020. “Effective ways to build and evaluate individual survival distributions.” J. Mach. Learn. Res. 21 (1): 3289–3351.
Harrell, F. E., Jr., K. L. Lee, and D. B. Mark. 1996. “Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.” Stat. Med. 15 (4): 361–387. https://doi.org/10.1002/(SICI)1097-0258(19960229)15:4%3C361::AID-SIM168%3E3.0.CO;2-4.
Ishwaran, H., U. B. Kogalur, E. H. Blackstone, and M. S. Lauer. 2008. “Random survival forests.” Ann. Appl. Stat. 2 (3): 841–860. https://doi.org/10.1214/08-AOAS169.
James, G., D. Witten, T. Hastie, and R. Tibshirani. 2013. An introduction to statistical learning: With applications in R. New York: Springer.
Johnson, J. M., and T. M. Khoshgoftaar. 2019. “Survey on deep learning with class imbalance.” J. Big Data 6 (1): 27. https://doi.org/10.1186/s40537-019-0192-5.
Katzman, J. L., U. Shaham, A. Cloninger, J. Bates, T. T. Jiang, and Y. Kluger. 2018. “DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network.” BMC Med. Res. Methodol. 18 (Feb): 24. https://doi.org/10.1186/s12874-018-0482-1.
Laakso, T., T. Kokkonen, I. Mellin, and R. Vahala. 2019. “Sewer life span prediction: Comparison of methods and assessment of the sample impact on the results.” Water 11 (12): 2657. https://doi.org/10.3390/w11122657.
Li, Z., B. Zhang, Y. Wang, F. Chen, R. Taib, V. Whiffin, and Y. Wang. 2014. “Water pipe condition assessment: A hierarchical beta process approach for sparse incident data.” Mach. Learn. 95 (Apr): 11–26. https://doi.org/10.1007/s10994-013-5386-z.
Loft Data Science Team. 2021. “XGBSE: XGBoost survival embeddings.” Accessed March 30, 2021. https://github.com/loft-br/xgboost-survival-embeddings.
Means, E. G., T. Brueck, A. Manning, L. Dixon, J. Miles, and R. Patrick. 2002. “The coming crisis: Water institutions and infrastructure.” J. Am. Water Works Assoc. 94 (1): 34. https://doi.org/10.1002/j.1551-8833.2002.tb09370.x.
Roe, B. P., H. J. Yang, J. Zhu, Y. Liu, I. Stancu, and G. McGregor. 2005. “Boosted decision trees as an alternative to artificial neural networks for particle identification.” Nucl. Instrum. Methods Phys. Res. Sect. A 543 (2–3): 577–584. https://doi.org/10.1016/j.nima.2004.12.018.
Russell, S. J., and P. Norvig. 2010. Artificial intelligence: A modern approach. Upper Saddle River, NJ: Prentice Hall.
Scornet, E. 2016. “Random forests and kernel methods.” IEEE Trans. Inf. Theory 62 (3): 1485–1500. https://doi.org/10.1109/TIT.2016.2514489.
Snider, B., and E. McBean. 2020a. “Improving urban water security through pipe-break prediction models: Machine learning or survival analysis.” J. Environ. Eng. 146 (3): 04019129. https://doi.org/10.1061/(ASCE)EE.1943-7870.0001657.
Snider, B., and E. McBean. 2020b. “Watermain breaks and data: The intricate relationship between data availability and accuracy of predictions.” Urban Water J. 17 (2): 163–176. https://doi.org/10.1080/1573062X.2020.1748664.
Snider, B., and E. McBean. 2021. “Combining machine learning and survival statistics to predict remaining service life of watermains.” J. Infrastruct. Syst. 27 (3): 04021019. https://doi.org/10.1061/(ASCE)IS.1943-555X.0000629.
Wang, P., Y. Li, and C. K. Reddy. 2019. “Machine learning for survival analysis: A survey.” ACM Comput. Surv. 51 (6): 1–36. https://doi.org/10.1145/3214306.
Winkler, D., M. Haltmeier, M. Kleidorfer, W. Rauch, and F. Tscheikner-Gratl. 2018. “Pipe failure modelling for water distribution networks using boosted decision trees.” Struct. Infrastruct. Eng. 14 (10): 1402–1411. https://doi.org/10.1080/15732479.2018.1443145.
Wongvibulsin, S., K. C. Wu, and S. L. Zeger. 2019. “Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis.” BMC Med. Res. Methodol. 20 (1): 1–14. https://doi.org/10.1186/s12874-019-0863-0.
Yu, C.-N., R. Greiner, H.-C. Lin, and V. Baracos. 2011. “Learning patient-specific cancer survival distributions as a sequence of dependent regressors.” In Proc., 24th Int. Conf. on Neural Information Processing Systems, 1845–1853. Granada, Spain: Curran Associates.
Zhang, Y., and A. Haghani. 2015. “A gradient boosting method to improve travel time prediction.” Transp. Res. Part C: Emerging Technol. 58 (Sep): 308–324. https://doi.org/10.1016/j.trc.2015.02.019.
Zhao, L., and D. Feng. 2020. “Deep neural networks for survival analysis using pseudo values.” IEEE J. Biomed. Health Inf. 24 (11): 3308–3314. https://doi.org/10.1109/JBHI.2020.2980204.

Information & Authors

Information

Published In

Go to Journal of Infrastructure Systems
Journal of Infrastructure Systems
Volume 29Issue 4December 2023

History

Received: Mar 30, 2022
Accepted: Apr 22, 2023
Published online: Oct 3, 2023
Published in print: Dec 1, 2023
Discussion open until: Mar 3, 2024

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Graduate Research Assistant, Dept. of Civil Engineering, Univ. of British Columbia, Vancouver, BC, Canada V6T 1Z4 (corresponding author). ORCID: https://orcid.org/0000-0002-5610-6993. Email: [email protected]
Barbara J. Lence, M.ASCE [email protected]
Professor, Dept. of Civil Engineering, Univ. of British Columbia, Vancouver, BC, Canada V6T 1Z4. Email: [email protected]
Sudhir Kshirsagar, M.ASCE https://orcid.org/0000-0002-3545-8092
President, Global Quality Corp., 3071 Higgins Pl., Palo Alto, CA 94303. ORCID: https://orcid.org/0000-0002-3545-8092

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share