Case Studies
Feb 28, 2022

Performance Evaluation of Pipe Break Machine Learning Models Using Datasets from Multiple Utilities

Publication: Journal of Infrastructure Systems
Volume 28, Issue 2

Abstract

Water pipeline infrastructures are critical for the delivery of lifeline services; however, these aging systems are experiencing increasing breakage rates. To assist utilities in identifying the most vulnerable assets, sustained research efforts have been made in developing machine learning models to accurately predict future failures. The performance of these methods heavily depends on the quantity of reliable data, while most utilities only have limited records of historical pipe breaks. To overcome the limitation of data availability, this article presents a case study exploring the performance of machine learning methods for predicting future failures when system information from multiple utilities is combined. Six utilities are considered, for which predictive models are trained and evaluated in several scenarios, (1) using data from only a single reference system, (2) all systems combined, and (3) a bootstrapped sample of multiple systems to match the pipe material distribution of the reference system. Empirical results suggest that variance controlling algorithms, such as random forests, are less sensitive to the availability of data, and that introducing information from third-party sources only leads to marginal changes in performance. Overall, the amount of break records from the reference system itself has the largest influence on accuracy, suggesting that utilities must keep reliable historical break data to maximize the power of predictive modeling for their asset management programs.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some or all data, models, or code generated or used during the study are proprietary or confidential in nature and may only be provided with restrictions (e.g., anonymized data). The confidential datasets used in this article are listed as follows:
Pipeline Dataset for utilities A, B, C, D, E, F.
Historical Pipe Break Records for utilities A, B, C, D, E, F.

Acknowledgments

This work was funded by Xylem, Inc. which is developing professional services related to the work described in this article. The independence of this work is reviewed and approved in accordance with Xylem Inc.’s policy on objectivity in research. The opinions and views expressed are those of the researchers and do not necessarily reflect those of the sponsors.

References

ASCE. 2017. Infrastructure report card. Reston, VA: ASCE.
Asnaashari, A., E. A. McBean, B. Gharabaghi, and D. Tutt. 2013. “Forecasting watermain failure using artificial neural network modelling.” Can. Water Resour. J. 38 (1): 24–33. https://doi.org/10.1080/07011784.2013.774153.
Baird, G. M. 2010. “A game plan for aging water infrastructure.” J. Am. Water Works Assn. 102 (4): 74–82. https://doi.org/10.1002/j.1551-8833.2010.tb10092.x.
Barton, N. A., T. S. Farewell, S. H. Hallett, and T. F. Acland. 2019. “Improving pipe failure predictions: Factors effecting pipe failure in drinking water networks.” Water Res. 164 (Nov): 114926. https://doi.org/10.1016/j.watres.2019.114926.
Chen, T., and C. Guestrin. 2016. “XGBoost: A scalable tree boosting system.” In Proc., 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 785–794. New York: Association for Computing Machinery.
Chen, T.-J., J. Beekman, S. David Guikema, and S. Shashaani. 2019. “Statistical modeling in absence of system specific data: Exploratory empirical analysis for prediction of water main breaks.” J. Infrastruct. Syst. 25 (2): 04019009. https://doi.org/10.1061/(ASCE)IS.1943-555X.0000482.
Chen, T. Y., and S. D. Guikema. 2020. “Prediction of water main failures with the spatial clustering of breaks.” Reliab. Eng. Syst. Saf. 203 (Nov): 107108. https://doi.org/10.1016/j.ress.2020.107108.
Chojnacki, A., C. Dai, A. Farahi, G. Shi, J. Webb, D. T. Zhang, J. Abernethy, and E. Schwartz. 2017. “A data science approach to understanding residential water contamination in flint.” In Proc., 23rd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 1407–1416. New York: Association for Computing Machinery.
Clark, R. M., and J. A. Goodrich. 1989. “Developing a data base on infrastructure needs.” J. Am. Water Works Assoc. 81 (7): 81–87. https://doi.org/10.1002/j.1551-8833.1989.tb03242.x.
Davis, P., and D. Marlow. 2008. “Asset management: Quantifying economic lifetime of large-diameter pipelines.” J. Am. Water Works Assoc. 100 (7): 110–119. https://doi.org/10.1002/j.1551-8833.2008.tb09680.x.
DHS (Department of Homeland Security). 2009. National infrastructure protection plan. Washington, DC: DHS.
Economou, T., Z. Kapelan, and T. C. Bailey. 2012. “On the prediction of underground water pipe failures: Zero inflation and pipe-specific effects.” J. Hydroinf. 14 (4): 872. https://doi.org/10.2166/hydro.2012.144.
Fitchett, J., D. M. Hughes, and K. Karadimitriou. 2020a. “Machine learning: Getting even smarter addressing water main breaks.” In Proc., Pipelines 2020, 241–248. Reston, VA: ASCE.
Fitchett, J. C., K. Karadimitriou, Z. West, and D. M. Hughes. 2020b. “Machine learning for pipe condition assessments.” J. Am. Water Works Assn. 112 (5): 50–55. https://doi.org/10.1002/awwa.1501.
Folkman, S. 2012. Water main break rates in the USA and Canada: A comprehensive study. Logan, UT: Utah State Univ.
Francis, R. A., S. D. Guikema, and L. Henneman. 2014. “Bayesian belief networks for predicting drinking water distribution system pipe breaks.” Reliab. Eng. Syst. Saf. 130 (Oct): 1–11. https://doi.org/10.1016/j.ress.2014.04.024.
Guikema, S. D., and S. M. Quiring. 2012. “Hybrid data mining-regression for infrastructure risk assessment based on zero-inflated data.” Reliab. Eng. Syst. Saf. 99 (Mar): 178–182. https://doi.org/10.1016/j.ress.2011.10.012.
Hastie, T., R. Tibshirani, and J. Friedman. 2009. The elements of statistical learning: Data mining, inference, and prediction. 2nd ed. New York: Springer.
Jenkins, L., S. Gokhale, and M. McDonald. 2015. “Comparison of pipeline failure prediction models for water distribution networks with uncertain and limited data.” J. Pipeline Syst. Eng. Pract. 6 (2): 04014012. https://doi.org/10.1061/(ASCE)PS.1949-1204.0000181.
Kabir, G., S. Tesfamariam, J. Hemsing, and R. Sadiq. 2020. “Handling incomplete and missing data in water network database using imputation methods.” Sustainable Resilient Infrastruct. 5 (6): 365–377. https://doi.org/10.1080/23789689.2019.1600960.
Kabir, G., S. Tesfamariam, J. Loeppky, and R. Sadiq. 2016. “Predicting water main failures: A Bayesian model updating approach.” Knowl.-Based Syst. 110 (Jul): 144–156. https://doi.org/10.1016/j.knosys.2016.07.024.
Kleiner, Y., and B. Rajani. 2001. “Comprehensive review of structure deterioration of water mains: Statistical models.” Urban Water 3 (3): 131–150. https://doi.org/10.1016/S1462-0758(01)00033-4.
Konstantinou, C., and I. Stoianov. 2020. “A comparative study of statistical and machine learning methods to infer causes of pipe breaks in water supply networks.” Urban Water J. 17 (6): 534–548. https://doi.org/10.1080/1573062X.2020.1800758.
Michaud, D., and G. G. E. Apostolakis. 2006. “Methodology for ranking the elements of water-supply networks.” J. Infrastruct. Syst. 12 (4): 230–242. https://doi.org/10.1061/(ASCE)1076-0342(2006)12:4(230).
Nafi, A., and Y. Tlili. 2015. “Functional and residual capital values as criteria for water pipe renewal.” Struct. Infrastruct. Eng. 11 (2): 194–209. https://doi.org/10.1080/15732479.2013.862728.
NOAA (National Oceanic and Atmospheric Administration). 2010. Local climatological data (LCD) dataset documentation. Silver Spring, MD: NOAA.
Pelletier, G., A. Mailhot, and J.-P. Villeneuve. 2003. “Modeling water pipe breaks—Three case studies.” J. Water Resour. Plann. Manage. 129 (2): 115–123. https://doi.org/10.1061/(ASCE)0733-9496(2003)129:2(115).
Perumean-Chaney, S. E., C. Morgan, D. McDowall, and I. Aban. 2013. “Zero-inflated and overdispersed: What’s one to do?” J. Stat. Comput. Simul. 83 (9): 1671–1683. https://doi.org/10.1080/00949655.2012.668550.
Rajani, B., and Y. Kleiner. 2001. “Comprehensive review of structural deterioration of water mains: Physically based models.” Urban Water 3 (3): 151–164. https://doi.org/10.1016/S1462-0758(01)00032-2.
Rajani, B., and J. Makar. 2000. “A methodology to estimate remaining service life of grey cast iron water mains.” Can. J. Civ. Eng. 27 (6): 1259–1272. https://doi.org/10.1139/l00-073.
Sattar, A. M., Ö. F. Ertuğrul, B. Gharabaghi, E. A. McBean, and J. Cao. 2019. “Extreme learning machine model for water network management.” Neural Comput. Appl. 31 (1): 157–169. https://doi.org/10.1007/s00521-017-2987-7.
Savic, D. A., O. Giustolisi, and D. Laucelli. 2009. “Asset deterioration analysis using multi-utility data and multi-objective data mining.” J. Hydroinf. 11 (3–4): 211–224. https://doi.org/10.2166/hydro.2009.019.
Shamir, U., and C. D. D. Howard. 1979. “An analytic approach to scheduling pipe replacement.” Am. Water Works Assoc. J. 71 (5): 248–258. https://doi.org/10.1002/j.1551-8833.1979.tb04345.x.
Snider, B., and E. A. McBean. 2020. “Watermain breaks and data: The intricate relationship between data availability and accuracy of predictions.” Urban Water J. 17 (2): 163–176. https://doi.org/10.1080/1573062X.2020.1748664.
St. Clair, A. M., and S. Sinha. 2012. “State-of-the-technology review on water pipe condition, deterioration and failure rate prediction models!” Urban Water J. 9 (2): 85–112. https://doi.org/10.1080/1573062X.2011.644566.
Tlili, Y., and A. Nafi. 2012. “A practical decision scheme for the prioritization of water pipe replacement.” Water Sci. Technol. Water Supply 12 (6): 895–917. https://doi.org/10.2166/ws.2012.068.
USDA. 2012. “Soil survey geographic database (SSURGO) data packing and use.” Accessed February 1, 2021. https://websoilsurvey.nrcs.usda.gov/.
USEPA. 2018. Drinking water infrastructure needs survey and assessment. Washington, DC: USEPA.
Walski, T. M., and A. Pelliccia. 1982. “Economic analysis of water main breaks.” Am. Water Works Assoc. 74 (3): 140–147. https://doi.org/10.1002/j.1551-8833.1982.tb04874.x.
Wang, Y., T. Zayed, and O. Moselhi. 2009. “Prediction models for annual break rates of water mains.” J. Perform. Constr. Facil. 23 (1): 47–54. https://doi.org/10.1061/(ASCE)0887-3828(2009)23:1(47).
Wilson, D., Y. Filion, and I. Moore. 2017. “State-of-the-art review of water pipe failure prediction models and applicability to large-diameter mains.” Urban Water J. 14 (2): 173–184. https://doi.org/10.1080/1573062X.2015.1080848.
Wood, A., and B. J. Lence. 2009. “Using water main break data to improve asset management for small and medium utilities: District of maple ridge, B.C.” J. Infrastruct. Syst. 15 (2): 111–119. https://doi.org/10.1061/(ASCE)1076-0342(2009)15:2(111).
Yamijala, S., S. D. Guikema, and K. Brumbelow. 2009. “Statistical models for the analysis of water distribution system pipe break data.” Reliab. Eng. Syst. Saf. 94 (2): 282–293. https://doi.org/10.1016/j.ress.2008.03.011.
Yazdekhasti, S., G. Vladeanu, and C. Daly. 2020. “Evaluation of artificial intelligence tool performance for predicting water pipe failures.” In Proc., Pipelines 2020, 203–211. Reston, VA: ASCE.

Information & Authors

Information

Published In

Go to Journal of Infrastructure Systems
Journal of Infrastructure Systems
Volume 28Issue 2June 2022

History

Received: Feb 20, 2021
Accepted: Jan 6, 2022
Published online: Feb 28, 2022
Published in print: Jun 1, 2022
Discussion open until: Jul 28, 2022

Permissions

Request permissions for this article.

Authors

Affiliations

Thomas Ying-Jeh Chen, Ph.D. [email protected]
Research Data Scientist, Xylem Inc., 8920 MD-108, Columbia, MD 21045 (corresponding author). Email: [email protected]
Greta Vladeanu, Ph.D.
Research Analyst, Xylem Inc., 8920 MD-108, Columbia, MD 21045.
Sepideh Yazdekhasti, Ph.D.
Decision Science Manager, Xylem Inc., 8920 MD-108, Columbia, MD 21045.
Craig Michael Daly
P.E.
Chief Engineer, Xylem Inc., 8920 MD-108, Columbia, MD 21045.

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share