Technical Papers
Jun 9, 2021

Handling Incomplete and Missing Data in Corrosion Pit Measurement Database Using Imputation Methods: Model Development Using Artificial Neural Network

Publication: Journal of Pipeline Systems Engineering and Practice
Volume 12, Issue 3

Abstract

Data scarcity and missing values are a prime challenge in developing a corrosion prediction model. In this paper, eight imputation techniques are explored using the National Bureau of Standards (NBS) corrosion database. The eight imputation techniques are mean, median, linear regression (LR), K-nearest neighbor (KNN), iterative robust model-based imputation (IRMI), multiple imputations of incomplete multivariate data (AMELIA), sequential imputation for missing values (IMPSEQ), and principal component analysis (PCA). The utility of imputation techniques is checked by training a neural network (NN) on the data sets imputed by the eight techniques. Among the techniques, KNN and IMPSEQ performed better by achieving a low error and high coefficient of determination R2. Results were compared with a baseline accuracy, where the NN model was trained on the original corrosion data set without the missing values. The NN performance increased from the baseline accuracy (81%) when it was trained by KNN (85%) and IMPSEQ (91%) imputed data sets.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors acknowledge the financial support through the Natural Sciences and Engineering Research Council of Canada (RGPIN-2019-05584) under the Discovery Grant programs, MITACS Accelerate, and the BC Oil and Gas Research and Innovation Society (BC OGRIS).

References

Acuna, E., and C. Rodriguez. 2004. “The treatment of missing values and its effect on classifier accuracy.” In Classification, clustering, and data mining applications, 639–647. New York: Springer.
Alberta Energy Regulator. 2013. Pipeline performance in Alberta, 1990–2012. Calgary, AB, Canada: Alberta Energy Regulator.
Alzabeebee, S. 2019. “Seismic response and design of buried concrete pipes subjected to soil loads.” Tunnelling Underground Space Technol. 93: 103084. https://doi.org/10.1016/j.tust.2019.103084.
Alzabeebee, S. 2020. “Application of epr-moga in computing the liquefaction-induced settlement of a building subjected to seismic shake.” Eng. Comp. https://doi.org/10.1007/s00366-020-01159-9.
Alzabeebee, S., and D. N. Chapman. 2020. “Evolutionary computing to determine the skin friction capacity of piles embedded in clay and evaluation of the available analytical methods.” Transp. Geotech. 24 (Sep): 100372. https://doi.org/10.1016/j.trgeo.2020.100372.
Alzabeebee, S., D. N. Chapman, and A. Faramarzi. 2018. “Development of a novel model to estimate bedding factors to ensure the economic and robust design of rigid pipes under soil loads.” Tunnelling Underground Space Technol. 71 (Jan): 567–578. https://doi.org/10.1016/j.tust.2017.11.009.
Alzabeebee, S., D. N. Chapman, and A. Faramarzi. 2019. “Economical design of buried concrete pipes subjected to UK standard traffic loading.” Proc. Inst. Civ. Eng. Struct. Build. 172 (2): 141–156. https://doi.org/10.1680/jstbu.17.00035.
Balekelayi, N., and S. Tesfamariam. 2020. “External corrosion pitting depth prediction using bayesian spectral analysis on bare oil and gas pipelines.” Int. J. Press. Vessels Pip. 188 (Dec): 104224. https://doi.org/10.1016/j.ijpvp.2020.104224.
Batista, G. E., and M. C. Monard. 2002. “A study of k-nearest neighbour as an imputation method.” His 87 (251–260): 48.
Bengio, Y. 2009. Learning deep architectures for AI. Hanover, MA: Now Publishers.
Betrie, G. D., R. Sadiq, S. Tesfamariam, and K. A. Morin. 2016. “On the issue of incomplete and missing water-quality data in mine site databases: Comparing three imputation methods.” Mine Water Environ. 35 (1): 3–9. https://doi.org/10.1007/s10230-014-0322-4.
Biezma, M. V., D. Agudo, and G. Barron. 2018. “A fuzzy logic method: Predicting pipeline external corrosion rate.” Int. J. Press. Vessels Pip. 163 (Jun): 55–62. https://doi.org/10.1016/j.ijpvp.2018.05.001.
Bishop, C. M. 1995. Neural networks for pattern recognition. New York: Oxford University Press.
Gao, Y., C. Merz, G. Lischeid, and M. Schneider. 2018. “A review on missing hydrological data processing.” Environ. Earth Sci. 77 (2): 47. https://doi.org/10.1007/s12665-018-7228-6.
Graham, J. W. 2009. “Missing data analysis: Making it work in the real world.” Ann. Rev. Psychol. 60: 549–576. https://doi.org/10.1146/annurev.psych.58.110405.085530.
Hassoun, M. H. 1995. Fundamentals of artificial neural networks. Cambridge, MA: MIT Press.
Hoffmann, H. 2021. “Soil classification (sand, clay, T,varargin).” Accessed December 1, 2020. https://www.mathworks.com/matlabcentral/fileexchange/45468-soil_classification-sand-clay-t-varargin/.
Honaker, J., and G. King. 2010. “What to do about missing values in time-series cross-section data.” Am. J. Political Sci. 54 (2): 561–581. https://doi.org/10.1111/j.1540-5907.2010.00447.x.
Honaker, J., G. King, and M. Blackwell. 2011. “Amelia II: A program for missing data.” J. Stat. Software 45 (7): 1–47. https://doi.org/10.18637/jss.v045.i07.
Ilin, A., and T. Raiko. 2010. “Practical approaches to principal component analysis in the presence of missing values.” J. Mach. Learn. Res. 11 (Aug): 1957–2000.
Jerez, J. M., I. Molina, P. J. García-Laencina, E. Alba, N. Ribelles, M. Martn, and L. Franco. 2010. “Missing data imputation using statistical and machine learning methods in a real breast cancer problem.” Artif. Intell. Med. 50 (2): 105–115. https://doi.org/10.1016/j.artmed.2010.05.002.
Josse, J., J. Pagès, and F. Husson. 2011. “Multiple imputation in principal component analysis.” Adv. Data Anal. Classif. 5 (3): 231–246. https://doi.org/10.1007/s11634-011-0086-7.
Kabir, G., S. Tesfamariam, J. Hemsing, and R. Sadiq. 2019. “Handling incomplete and missing data in water network database using imputation methods.” Sustainable Resilient Infrastruct. 5 (6): 365–377. https://doi.org/10.1080/23789689.2019.1600960.
Katano, Y., K. Miyata, H. Shimizu, and T. Isogai. 2003. “Predictive model for pit growth on underground pipes.” Corrosion 59 (2): 155–161. https://doi.org/10.5006/1.3277545.
Laakso, T., T. Kokkonen, I. Mellin, and R. Vahala. 2018. “Sewer condition prediction and analysis of explanatory factors.” Water 10 (9): 1239. https://doi.org/10.3390/w10091239.
Lam, C., and W. Zhou. 2016. “Statistical analyses of incidents on onshore gas transmission pipelines based on phmsa database.” Int. J. Press. Vessels Pip. 145: 29–40. https://doi.org/10.1016/j.ijpvp.2016.06.003.
Little, R. J. 1988. “Missing-data adjustments in large surveys.” J. Bus. Econ. Stat. 6 (3): 287–296.
Mcdonald, R. A., P. W. Thurston, and M. R. Nelson. 2000. “A Monte Carlo study of missing item methods.” Organ. Res. Methods 3 (1): 71–92. https://doi.org/10.1177/109442810031003.
McKnight, P. E., K. M. McKnight, S. Sidani, and A. J. Figueredo. 2007. Missing data: A gentle introduction. New York: Guilford Press.
Melchers, R. 2004. “Pitting corrosion of mild steel in marine immersion environment—Part 1: Maximum pit depth.” Corrosion 60 (9): 824–836. https://doi.org/10.5006/1.3287863.
Mughabghab, S., and T. Sullivan. 1989. “Evaluation of the pitting corrosion of carbon steels and other ferrous metals in soil systems.” Waste Manage. 9 (4): 239–251. https://doi.org/10.1016/0956-053X(89)90408-X.
Norhazilan, M., Y. Nordin, K. Lim, R. Siti, A. Safuan, and M. Norhamimi. 2012. “Relationship between soil properties and corrosion of carbon steel.” J. Appl. Sci. Res. 8 (3): 1739–1747.
Osman, M. S., A. M. Abu-Mahfouz, and P. R. Page. 2018. “A survey on data imputation techniques: Water distribution system as a use case.” IEEE Access 6 (Oct): 63279–63291. https://doi.org/10.1109/ACCESS.2018.2877269.
Palaniappan, V. 2018. “Pipeline risk assessment using dynamic Bayesian network (DBN) for internal corrosion.” Ph.D. thesis, Dept. of Chemical Engineering, Texas A&M Univ.
Ricker, R. E. 2010. “Analysis of pipeline steel corrosion data from NBS (NIST) studies conducted between 1922–1940 and relevance to pipeline management.” J. Res. Natl. Inst. Stand. Technol. 115 (5): 373. https://doi.org/10.6028/jres.115.026.
Romanoff, M. 1957. Underground corrosion. Circular 579. Washington, DC: National Bureau of Standards.
Schafer, J. L., and M. K. Olsen. 1998. “Multiple imputation for multivariate missing-data problems: A data analyst’s perspective.” Multivar. Behav. Res. 33 (4): 545–571. https://doi.org/10.1207/s15327906mbr3304_5.
Shahriar, A., R. Sadiq, and S. Tesfamariam. 2012. “Risk analysis for oil and gas pipelines: A sustainability assessment approach using fuzzy based bow-tie analysis.” J. Loss Prev. Process Ind. 25 (3): 505–523. https://doi.org/10.1016/j.jlp.2011.12.007.
Sharma, S. 2017. “Activation functions in neural networks.” Towards Data Sci. 6 (12): 310–316.
Templ, M., A. Kowarik, and P. Filzmoser. 2011. “Iterative stepwise regression imputation using standard and robust methods.” Comput. Stat. Data Anal. 55 (10): 2793–2806. https://doi.org/10.1016/j.csda.2011.04.012.
Tesfamariam, S., B. Rajani, and R. Sadiq. 2006. “Possibilistic approach for consideration of uncertainties to estimate structural capacity of ageing cast iron water mains.” Can. J. Civ. Eng. 33 (8): 1050–1064. https://doi.org/10.1139/l06-042.
Velázquez, J., F. Caleyo, A. Valor, and J. Hallen. 2009. “Predictive model for pitting corrosion in buried oil and gas pipelines.” Corrosion 65 (5): 332–342. https://doi.org/10.5006/1.3319138.
Verboven, S., K. V. Branden, and P. Goos. 2007. “Sequential imputation for missing values.” Comput. Biol. Chem. 31 (5–6): 320–327. https://doi.org/10.1016/j.compbiolchem.2007.07.001.
Widaman, K. F. 2006. “Best practices in quantitative methods for developmentalists: III. Missing data: What to do with or without them.” Monogr. Soc. Res. Child Dev. 71 (3): 42–64. https://doi.org/10.1111/j.1540-5834.2006.00404.x.
Zhang, S. 2012. “Nearest neighbor selection for iteratively knn imputation.” J. Syst. Software 85 (11): 2541–2552. https://doi.org/10.1016/j.jss.2012.05.073.

Information & Authors

Information

Published In

Go to Journal of Pipeline Systems Engineering and Practice
Journal of Pipeline Systems Engineering and Practice
Volume 12Issue 3August 2021

History

Received: Dec 12, 2020
Accepted: Mar 17, 2021
Published online: Jun 9, 2021
Published in print: Aug 1, 2021
Discussion open until: Nov 9, 2021

Permissions

Request permissions for this article.

Authors

Affiliations

Doctoral Student, School of Engineering, Univ. of British Columbia, Okanagan Campus, 3333 University Way, Kelowna, BC, Canada V1V 1V7. ORCID: https://orcid.org/0000-0002-5196-2811. Email: [email protected]
Professor, School of Engineering, Univ. of British Columbia, Okanagan Campus, 3333 University Way, Kelowna, BC, Canada V1V 1V7 (corresponding author). ORCID: https://orcid.org/0000-0001-5353-5250. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share