Technical Papers
Dec 7, 2021

Data Mining Algorithms for Water Main Condition Prediction—Comparative Analysis

Publication: Journal of Water Resources Planning and Management
Volume 148, Issue 2

Abstract

Accurate prediction of water mains condition is critical for effective rehabilitation planning. Advances in machine learning techniques can improve condition predictions. This paper compares the capabilities of various data mining techniques in predicting the condition of water mains. Predictive models investigated include generalized linear model, deep learning, decision tree, random forest, XGBoost, AdaBoost, and support vector machines. Models are first constructed leveraging a portion of the City of Waterloo, Canada, database. Genetic algorithm and cross-validation are then employed to optimize the hyperparameter tuning process. Several performance metrics and statistical tests are employed to compare the performance of the developed models utilizing a new set of data not previously used. The XGBoost model yielded the most promising results, with a mean relative error of 1.29%. Water main conditions are numerically represented on a scale from 0 to 10, with 10 indicating the highest condition. Extensive sensitivity analysis is conducted to obtain deeper insights into the most critical attributes for condition prediction. The developed model may help city managers develop optimal rehabilitation and renewal plans, considering the current and expected condition of their pipe inventory.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some or all of the data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request. The water main conditions and attribute data utilized in the models’ development are available upon reasonable request. In addition, codes of machine learning algorithms and statistical tests are available upon reasonable request.

References

Almheiri, Z., M. Meguid, and T. Zayed. 2020. “Intelligent approaches for predicting failure of water mains.” J. Pipeline Syst. Eng. Pract. 11 (4): 04020044. https://doi.org/10.1061/(ASCE)PS.1949-1204.0000485.
ASCE. 2021. “2021 report card for America’s infrastructure.” Accessed May 1, 2021. https://infrastructurereportcard.org/cat-item/drinking-water/.
Assad, A., O. Moselhi, and T. Zayed. 2019. “A new metric for assessing resilience of water distribution networks.” Water 11 (8): 1701. https://doi.org/10.3390/w11081701.
Assad, A., O. Moselhi, and T. Zayed. 2020. “Resilience-driven multiobjective restoration planning for water distribution networks.” J. Perform. Constr. Facil. 34 (4): 04020072. https://doi.org/10.1061/(ASCE)CF.1943-5509.0001478.
Assad, A., O. Moselhi, and T. Zayed. 2021. “Resilience-driven sustainability-based rehabilitation planning for water distribution networks.” J. Constr. Eng. Manage. 147 (8): 04021079. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002100.
Aydogdu, M., and M. Firat. 2015. “Estimation of failure rate in water distribution network using fuzzy clustering and LS-SVM methods.” Water Resour. Manage. 29 (5): 1575–1590. https://doi.org/10.1007/s11269-014-0895-5.
Bengio, Y., and Y. Grandvalet. 2004. “No unbiased estimator of the variance of K-fold cross-validation.” J. Mach. Learn. Res. 5 (Sep): 1089–1105.
Bergstra, J., R. Bardenet, Y. Bengio, and B. Kégl. 2011. “Algorithms for hyper-parameter optimization.” In Vol. 24 of Proc., 25th Annual Conf. on Neural Information Processing Systems (NIPS 2011). Red Hook, NY: Curran Associates.
Bickel, P. J., B. Li, A. B. Tsybakov, S. A. van de Geer, B. Yu, T. Valdés, C. Rivero, J. Fan, and A. van der Vaart. 2006. “Regularization in statistics.” Test 15 (2): 271–344. https://doi.org/10.1007/BF02607055.
Botchkarev, A. 2018. “Performance metrics (error measures) in machine learning regression, forecasting and prognostics: Properties and typology.” Preprint, submitted September 9, 2018. https://arxiv.org/abs/1809.03006.
Breiman, L. 2001. “Random forests.” Mach. Learn. 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Callens, A., D. Morichon, S. Abadie, M. Delpey, and B. Liquet. 2020. “Using random forest and gradient boosting trees to improve wave forecast at a specific location.” Appl. Ocean Res. 104 (Nov): 102339. https://doi.org/10.1016/j.apor.2020.102339.
Cao, Y., Q.-G. Miao, J.-C. Liu, and L. Gao. 2013. “Advance and prospects of AdaBoost algorithm.” Acta Autom. Sin. 39 (6): 745–758. https://doi.org/10.1016/S1874-1029(13)60052-X.
Castellon, D. F., A. Fenerci, and O. Øiseth. 2021. “A comparative study of wind-induced dynamic response models of long-span bridges using artificial neural networks, support vector regression and buffeting theory.” J. Wind Eng. Ind. Aerodyn. 209 (Feb): 104484. https://doi.org/10.1016/j.jweia.2020.104484.
CEPA (Canadian Energy Pipeline Association). 2018. Hydrotechnical hazard integrity management recommended practices. 2nd ed. Calgary, AB, Canada: CEPA.
Chen, J., P. Tang, T. Rakstad, M. Patrick, and X. Zhou. 2020. “Augmenting a deep-learning algorithm with canal inspection knowledge for reliable water leak detection from multispectral satellite images.” Adv. Eng. Inf. 46 (Oct): 101161. https://doi.org/10.1016/j.aei.2020.101161.
Christodoulou, S. E. 2011. “Water network assessment and reliability analysis by use of survival analysis.” Water Resour. Manage. 25 (4): 1229–1238. https://doi.org/10.1007/s11269-010-9679-8.
CIRC (Canadian Infrastructure Report Card). 2019. Monitoring the state of Canada’s core public infrastructure: The Canadian infrastructure report card 2019. Toronto: CIRC.
City of Montreal. 2019. “Montreal assesses itself, compares itself and improves.” Accessed August 19, 2021. https://montreal.ca/articles/vue-sur-les-indicateurs-de-performance-21776?categorie=17.
Cortes, C., and V. Vapnik. 1995. “Support-vector networks.” Mach. Learn. 20 (3): 273–297. https://doi.org/10.1007/BF00994018.
Davison, A. C. 2003. Statistical models (Cambridge series in statistical and probabilistic mathematics). Cambridge, UK: Cambridge University Press.
Demissie, G., S. Tesfamariam, and R. Sadiq. 2017. “Prediction of pipe failure by considering time-dependent factors: Dynamic Bayesian belief network model.” ASCE-ASME J. Risk Uncertainty Eng. Syst. Part A: Civ. Eng. 3 (4): 04017017. https://doi.org/10.1061/AJRUA6.0000920.
Efron, B., and R. J. Tibshirani. 1994. An introduction to the bootstrap. London: CRC Press.
El-Abbasy, M. S., F. Mosleh, A. Senouci, T. Zayed, and H. Al-Derham. 2016. “Locating leaks in water mains using noise loggers.” J. Infrastruct. Syst. 22 (3): 04016012. https://doi.org/10.1061/(ASCE)IS.1943-555X.0000305.
El-Abbasy, M. S., A. Senouci, T. Zayed, F. Mirahadi, and L. Parvizsedghy. 2014. “Artificial neural network models for predicting condition of offshore oil and gas pipelines.” Autom. Constr. 45 (Sep): 50–65. https://doi.org/10.1016/j.autcon.2014.05.003.
El-Abbasy, M. S., T. Zayed, H. El Chanati, F. Mosleh, A. Senouci, and H. Al-Derham. 2019. “Simulation-based deterioration patterns of water pipelines.” Struct. Infrastruct. Eng. 15 (7): 965–982. https://doi.org/10.1080/15732479.2019.1599965.
Fares, H., and T. Zayed. 2010. “Hierarchical fuzzy expert system for risk of failure of water mains.” J. Pipeline Syst. Eng. Pract. 1 (1): 53–62. https://doi.org/10.1061/(ASCE)PS.1949-1204.0000037.
Farmani, R., K. Kakoudakis, K. Behzadian, and D. Butler. 2017. “Pipe failure prediction in water distribution systems considering static and dynamic factors.” Procedia Eng. 186 (Jan): 117–126. https://doi.org/10.1016/j.proeng.2017.03.217.
Freund, Y., and R. E. Schapire. 1996. “Experiments with a new boosting algorithm.” In Vol. 96 of Proc., ICML, 148–156. San Francisco: Morgan Kaufmann Publishers.
Friedman, J. H. 2001. “Greedy function approximation: A gradient boosting machine.” Ann. Stat. 29 (5): 1189–1232. https://doi.org/10.1214/aos/1013203451.
Goodfellow, I., Y. Bengio, and A. Courville. 2016. “Regularization for deep learning.” Chap. 7 in Deep Learning, 216–261. Cambridge, MA: MIT Press.
Guan, X., H. Burton, M. Shokrabadi, and Z. Yi. 2021. “Seismic drift demand estimation for steel moment frame buildings: From mechanics-based to data-driven models.” J. Struct. Eng. 147 (6): 04021058. https://doi.org/10.1061/(ASCE)ST.1943-541X.0003004.
Guo, G., S. Liu, Y. Wu, J. Li, R. Zhou, and X. Zhu. 2018. “Short-term water demand forecast based on deep learning method.” J. Water Resour. Plann. Manage. 144 (12): 04018076. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000992.
Han, H., and X. Jiang. 2014. “Overcome support vector machine diagnosis overfitting.” Supplement, Cancer Inf. 13 (S1): 145–158. https://doi.org/10.4137/cin.s13875.
Harvey, R., E. A. McBean, and B. Gharabaghi. 2014. “Predicting the timing of water main failure using artificial neural networks.” J. Water Resour. Plann. Manage. 140 (4): 425–434. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000354.
Ho, T. K. 1995. “Random decision forests.” In Proc., 3rd Int. Conf. Document Analysis and Recognition, 278–282. New York: IEEE.
Holland, J. H. 1975. Adaptation in natural and artificial systems. An introductory analysis with application to biology, control, and artificial intelligence. Ann Arbor, MI: University of Michigan Press.
Inkoom, S., J. Sobanjo, A. Barbu, and X. Niu. 2019. “Pavement crack rating using machine learning frameworks: Partitioning, bootstrap forest, boosted trees, naïve bayes, and K-nearest neighbors.” J. Transp. Eng. Part B. Pavements 145 (3): 04019031. https://doi.org/10.1061/JPEODX.0000126.
James, G., D. Witten, T. Hastie, and R. Tibshirani. 2013. Vol. 112 of An introduction to statistical learning, with applications in R. New York: Springer.
Ji, J., C. Zhang, J. Kodikara, and S.-Q. Yang. 2015. “Prediction of stress concentration factor of corrosion pits on buried pipes by least squares support vector machine.” Eng. Fail. Anal. 55 (Sep): 131–138. https://doi.org/10.1016/j.engfailanal.2015.05.010.
Kabir, G., S. Tesfamariam, and R. Sadiq. 2016. “Bayesian model averaging for the prediction of water main failure for small to large Canadian municipalities.” Can. J. Civ. Eng. 43 (3): 233–240. https://doi.org/10.1139/cjce-2015-0374.
Kleiner, Y., and B. Rajani. 2001. “Comprehensive review of structural deterioration of water mains: Statistical models.” Urban Water 3 (3): 131–150. https://doi.org/10.1016/S1462-0758(01)00033-4.
LeCun, Y., Y. Bengio, and G. Hinton. 2015. “Deep learning.” Nature 521 (7553): 436–444. https://doi.org/10.1038/nature14539.
Malek Mohammadi, M., M. Najafi, N. Salehabadi, R. Serajiantehrani, and V. Kaushal. 2020. “Predicting condition of sanitary sewer pipes with gradient boosting tree.” In Proc., Pipelines 2020, 80–89. Reston, VA: ASCE. https://doi.org/10.1061/9780784483206.010.
Mangalathu, S., H. Jang, S. H. Hwang, and J. S. Jeon. 2020. “Data-driven machine-learning-based seismic failure mode identification of reinforced concrete shear walls.” Eng. Struct. 208 (Apr): 110331. https://doi.org/10.1016/j.engstruct.2020.110331.
Mangalathu, S., and J. S. Jeon. 2019. “Machine learning-based failure mode recognition of circular reinforced concrete bridge columns: Comparative study.” J. Struct. Eng. 145 (10): 04019104. https://doi.org/10.1061/(ASCE)ST.1943-541X.0002402.
Mazumder, R. K., A. M. Salman, and Y. Li. 2021. “Failure risk analysis of pipelines using data-driven machine learning algorithms.” Struct. Saf. 89 (Mar): 102047. https://doi.org/10.1016/j.strusafe.2020.102047.
Murphy, K. P. 2012. Machine learning: A probabilistic perspective. Cambridge, MA: MIT Press.
Najafi, M., and S. Gokhale. 2005. Trenchless technology: Pipeline and utility design, construction and renewal. New York: McGraw-Hill.
Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized linear models.” J. R. Stat. Soc.: Ser. A (Gen.) 135 (3): 370–384. https://doi.org/10.2307/2344614.
Paradkar, A. B. 2016. “Evaluation and development of a structurally enhanced PVC water pipe.” Ph.D. dissertation, Dept. of Civil Engineering, Univ. of Texas at Arlington.
Phan, H. C., and A. S. Dhar. 2021. “Predicting pipeline burst pressures with machine learning models.” Int. J. Press. Vessels Pip. 191 (Jun): 104384. https://doi.org/10.1016/j.ijpvp.2021.104384.
Rachman, A., T. Zhang, and R. C. Ratnayake. 2021. “Applications of machine learning in pipeline integrity management: A state-of-the-art review.” Int. J. Press. Vessels Pip. 193 (Oct): 104471. https://doi.org/10.1016/j.ijpvp.2021.104471.
Sarsam, S. M. 2019. “Reinforcing the decision-making process in chemometrics: Feature selection and algorithm optimization.” In Proc., 2019 8th Int. Conf. on Software and Computer Applications, 11–16. New York: Association for Computing Machinery.
Scheidegger, A., J. P. Leitão, and L. Scholten. 2015. “Statistical failure models for water distribution pipes—A review from a unified perspective.” Water Res. 83 (Oct): 237–247. https://doi.org/10.1016/j.watres.2015.06.027.
St. Clair, A. M., and S. Sinha. 2012. “State-of-the-technology review on water pipe condition, deterioration and failure rate prediction models!” Urban Water J. 9 (2): 85–112. https://doi.org/10.1080/1573062X.2011.644566.
Syachrani, S., H. S. D. Jeong, and C. S. Chung. 2013. “Decision tree-based deterioration model for buried wastewater pipelines.” J. Perform. Constr. Facil. 27 (5): 633–645. https://doi.org/10.1061/(ASCE)CF.1943-5509.0000349.
Tavakoli, R., A. Sharifara, and M. Najafi. 2020. “Prediction of pipe failures in wastewater networks using random forest classification.” In Proc., Pipelines 2020, 90–102. Reston, VA: ASCE.
The Institute of Asset Management. 2008. “Asset management specification publicly available.” Accessed May 1, 2021. http://www.irantpm.ir/wp-content/uploads/2014/01/pass55-2008.pdf.
Tian, Y., G. S. Nearing, C. D. Peters-Lidard, K. W. Harrison, and L. Tang. 2016. “Performance metrics, error modeling, and uncertainty quantification.” Mon. Weather Rev. 144 (2): 607–613. https://doi.org/10.1175/MWR-D-15-0087.1.
Vapnik, V. N. 2000. The nature of statistical learning theory. New York: Springer.
Velasco, A. R., J. Muñuzuri, L. Onieva, and M. R. Palero. 2021. “Trends and applications of machine learning in water supply networks management.” J. Ind. Eng. Manage. 14 (1): 45–54. https://doi.org/10.3926/jiem.3280.
Wang, Y., T. Zayed, and O. Moselhi. 2009. “Prediction models for annual break rates of water mains.” J. Perform. Constr. Facil. 23 (1): 47–54. https://doi.org/10.1061/(ASCE)0887-3828(2009)23:1(47).
Wilson, D., Y. Filion, and I. Moore. 2017. “State-of-the-art review of water pipe failure prediction models and applicability to large-diameter mains.” Urban Water J. 14 (2): 173–184. https://doi.org/10.1080/1573062X.2015.1080848.
Winkler, D., M. Haltmeier, M. Kleidorfer, W. Rauch, and F. Tscheikner-Gratl. 2018. “Pipe failure modelling for water distribution networks using boosted decision trees.” Struct. Infrastruct. Eng. 14 (10): 1402–1411. https://doi.org/10.1080/15732479.2018.1443145.
Witten, I. H., E. Frank, M. A. Hall, and C. J. Pal. 2016. Data mining: Practical machine learning tools and techniques. Amsterdam, Netherlands: Elsevier.
Zeiler, M. D. 2012. “ADADELTA: An adaptive learning rate method.” Preprint, submitted December 22, 2012. https://arxiv.org/abs/1212.5701.
Zhang, Q., Z. Y. Wu, M. Zhao, J. Qi, Y. Huang, and H. Zhao. 2016. “Leakage zone identification in large-scale water distribution systems using multiclass support vector machines.” J. Water Resour. Plann. Manage. 142 (11): 04016042. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000661.
Zhou, X., Z. Tang, W. Xu, F. Meng, X. Chu, K. Xin, and G. Fu. 2019. “Deep learning identifies accurate burst locations in water distribution networks.” Water Res. 166 (Dec): 115058. https://doi.org/10.1016/j.watres.2019.115058.

Information & Authors

Information

Published In

Go to Journal of Water Resources Planning and Management
Journal of Water Resources Planning and Management
Volume 148Issue 2February 2022

History

Received: May 24, 2021
Accepted: Oct 20, 2021
Published online: Dec 7, 2021
Published in print: Feb 1, 2022
Discussion open until: May 7, 2022

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Postdoctoral Fellow, Dept. of Building, Civil and Environmental Engineering, Univ. of Alberta, Edmonton, AB, Canada T6G 1H9 (corresponding author). ORCID: https://orcid.org/0000-0002-7363-6646. Email: [email protected]
Ahmed Bouferguene [email protected]
Professor, Dept. of Building, Civil and Environmental Engineering, Univ. of Alberta, Edmonton, AB, Canada T6G 1H9. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

  • Efficacy of Tree-Based Models for Pipe Failure Prediction and Condition Assessment: A Comprehensive Review, Journal of Water Resources Planning and Management, 10.1061/JWRMD5.WRENG-6334, 150, 7, (2024).
  • English Teaching Achievement Prediction by Big Data Analysis under Deep Intervention, Journal of Electrical and Computer Engineering, 10.1155/2023/9542465, 2023, (1-11), (2023).
  • Hybrid Differential Evolution-Based Regression Tree Model for Predicting Downstream Dam Hazard Potential, Sustainability, 10.3390/su14053013, 14, 5, (3013), (2022).

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share