How Does Missing Data Imputation Affect the Forecasting of Urban Water Demand?
Publication: Journal of Water Resources Planning and Management
Volume 148, Issue 11
Abstract
Nowadays, drinking water demand forecasting has become fundamental to efficiently manage water distribution systems. With the growth of accessible data and the increase of available computational power, the scientific community has been tackling the forecasting problem, opting often for a data-driven approach with considerable results. However, the most performing methodologies, like deep learning, rely on the quantity and quality of the available data. In real life, the demand data are usually affected by the missing data problem. This study proposes an analysis of the role of missing data imputation in the frame of a short-term forecasting process. A set of conventional imputation algorithms were considered and applied on three test cases. Afterward, the forecasting process was performed using three state-of-the-art deep neural network models. The results showed that a good quality imputation can significantly affect the forecasting results. In particular, the results highlighted significant variation in the accuracy of the forecasting models that had past observation as inputs. On the contrary, a forecasting model that used only static variables as input was not affected by the imputation process and may be a good choice whenever a good quality imputation is not possible.
Get full access to this article
View all available purchase options and get full access to this article.
Data Availability Statement
The data used during the study were provided by a third party. Direct requests for these materials may be made to the provider as indicated in the Acknowledgments.
Acknowledgments
The authors would also like to thank Novareti S.P.A. for providing the data for this study. The authors would like to thank the anonymous reviewers for their valuable contribution. This study has been partially funded by the project “TESES-Urb—Techno-economic methodologies to investigate sustainable energy scenarios at urban level” of the Free University of Bozen-Bolzano.
References
Abadi, M., et al. 2016. “Tensorflow: A system for large-scale machine learning.” In Proc., 12th USENIX Symp. on Operating Systems Design and Implementation (OSDI 16), 265–283. Berkeley, California: USENIX.
Adamowski, J., and C. Karapataki. 2010. “Comparison of multivariate regression and artificial neural networks for peak urban water-demand forecasting: Evaluation of different ANN learning algorithms.” J. Hydrol. Eng. 15 (10): 729–743. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000245.
Baraldi, A. N., and C. K. Enders. 2010. “An introduction to modern missing data analyses.” J. Sch. Psychol. 48 (1): 5–37. https://doi.org/10.1016/j.jsp.2009.10.001.
Batista, G. E., and M. C. Monard. 2002. “A study of k-nearest neighbour as an imputation method.” His 87 (251–260): 48.
Bilogur, A. 2018. “Missingno: A missing data visualization suite.” J. Open Sour. Software 3 (22): 547. https://doi.org/10.21105/joss.00547.
Bougadis, J., K. Adamowski, and R. Diduch. 2005. “Short-term municipal water demand forecasting.” Hydrol. Processes 19 (1): 137–148. https://doi.org/10.1002/hyp.5763.
Breiman, L. 2001. “Random forests.” Mach. Learn. 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Brentan, B. M., E. Luvizotto Jr., M. Herrera, J. Izquierdo, and R. Pérez-Garca. 2017. “Hybrid regression model for near real-time urban water demand forecasting.” J. Comput. Appl. Math. 309 (Jan): 532–541. https://doi.org/10.1016/j.cam.2016.02.009.
Caillault, É. P., A. Lefebvre, and A. Bigand. 2017. “Dynamic time warping-based imputation for univariate time series data.” Pattern Recognit. Lett. 139 (2020): 139–147. https://doi.org/10.1016/j.patrec.2017.08.019.
Chan, T. K., C. S. Chin, and X. Zhong. 2018. “Review of current technologies and proposed intelligent methodologies for water distributed network leakage detection.” IEEE Access 6 (Dec): 78846–78867. https://doi.org/10.1109/ACCESS.2018.2885444.
Chollet, F., et al. 2015. “Keras.” Accessed May 3, 2022. https://keras.io.
Di Lascio, F. M. L., A. Menapace, and M. Righetti. 2020. “Joint and conditional dependence modelling of peak district heating demand and outdoor temperature: A copula-based approach.” Stat. Methods Appl. 29 (2): 373–395. https://doi.org/10.1007/s10260-019-00488-4.
Di Nardo, A., M. Di Natale, C. Giudicianni, G. Santonastaso, and D. Savic. 2018. “Simplified approach to water distribution system management via identification of a primary network.” J. Water Resour. Plann. Manage. 144 (2): 04017089. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000885.
Donkor, E. A., T. A. Mazzuchi, R. Soyer, and J. Alan Roberson. 2014. “Urban water demand forecasting: Review of methods and models.” J. Water Resour. Plann. Manage. 140 (2): 146–159. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000314.
Ghiassi, M., D. K. Zimbra, and H. Saidane. 2008. “Urban water demand forecasting with a dynamic artificial neural network model.” J. Water Resour. Plann. Manage. 134 (2): 138–146. https://doi.org/10.1061/(ASCE)0733-9496(2008)134:2(138).
Graves, A. 2013. “Generating sequences with recurrent neural networks.” Preprint, submitted August 4, 2013. https://arxiv.org/abs/1308.0850.
Harvey, A. C. 1990. Forecasting, structural time series models and the Kalman filter. Cambridge, MA: Cambridge University Press.
Hochreiter, S., and J. Schmidhuber. 1997. “Long short-term memory.” Neural Comput. 9 (8): 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Hyndman, R. J., and Y. Khandakar. 2008. “Automatic time series forecasting: The forecast package for R.” J. Stat. Software 27 (3): 1–22. https://doi.org/10.18637/jss.v027.i03.
Jain, A., and L. E. Ormsbee. 2002. “Short-term water demand forecast modeling techniques-conventional methods versus AI.” J. Am. Water Works Assoc. 94 (7): 64–72. https://doi.org/10.1002/j.1551-8833.2002.tb09507.x.
Jalles, J. T. 2009. Structural time series models and the Kalman filter: A concise review. FEUNL Working Paper No. 541. Amsterdam, Netherlands: Elsevier.
Jun, S., D. Jung, and K. E. Lansey. 2021. “Comparison of imputation methods for end-user demands in water distribution systems.” J. Water Resour. Plann. Manage. 147 (12): 04021080. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001477.
Kalton, G., and L. Kish. 1984. “Some efficient random imputation methods.” Commun. Stat.- Theory Methods 13 (16): 1919–1939. https://doi.org/10.1080/03610928408828805.
Kang, J., Y.-J. Park, J. Lee, S.-H. Wang, and D.-S. Eom. 2017. “Novel leakage detection by ensemble CNN-SVM and graph-based localization in water distribution systems.” IEEE Trans. Ind. Electron. 65 (5): 4279–4289. https://doi.org/10.1109/TIE.2017.2764861.
Kingma, D. P., and J. Ba. 2014. “Adam: A method for stochastic optimization.” Preprint, submitted December 22, 2014. https://arxiv.org/abs/1412.6980.
Kuhn, M., and K. Johnson. 2013. Vol. 26 of Applied predictive modeling. New York: Springer.
Lin, W.-C., and C.-F. Tsai. 2020. “Missing value imputation: A review and analysis of the literature (2006–2017).” Artif. Intell. Rev. 53 (2): 1487–1509. https://doi.org/10.1007/s10462-019-09709-4.
Makropoulos, C., and D. Savić. 2019. “Urban hydroinformatics: Past, present and future.” Water 11 (10): 1959. https://doi.org/10.3390/w11101959.
Mala-Jetmarova, H., N. Sultanova, and D. Savic. 2017. “Lost in optimisation of water distribution systems? A literature review of system operation.” Environ. Modell. Software 93 (Jul): 209–254. https://doi.org/10.1016/j.envsoft.2017.02.009.
Marsili, V., S. Meniconi, S. Alvisi, B. Brunone, and M. Franchini. 2022. “Stochastic approach for the analysis of demand induced transients in real water distribution systems.” J. Water Resour. Plann. Manage. 148 (1): 04021093. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001498.
Menapace, A., A. Zanfei, M. Felicetti, D. Avesani, M. Righetti, and R. Gargano. 2020. “Burst detection in water distribution systems: The issue of dataset collection.” Appl. Sci. 10 (22): 8219. https://doi.org/10.3390/app10228219.
Menapace, A., A. Zanfei, and M. Righetti. 2021. “Tuning ANN hyperparameters for forecasting drinking water demand.” Appl. Sci. 11 (9): 4290. https://doi.org/10.3390/app11094290.
Moritz, S., and T. Bartz-Beielstein. 2017. “imputeTS: Time series missing value imputation in R.” R J. 9 (1): 207–218. https://doi.org/10.32614/RJ-2017-009.
Moritz, S., A. Sardá, T. Bartz-Beielstein, M. Zaefferer, and J. Stork. 2015. “Comparison of different methods for univariate time series imputation in R.” Preprint submitted October 13, 2015. https://arxiv.org/abs/1510.03924.
Mu, L., F. Zheng, R. Tao, Q. Zhang, and Z. Kapelan. 2020. “Hourly and daily urban water demand predictions using a long short-term memory based model.” J. Water Resour. Plann. Manage. 146 (9): 05020017. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001276.
Osman, M. S., A. M. Abu-Mahfouz, and P. R. Page. 2018. “A survey on data imputation techniques: Water distribution system as a use case.” IEEE Access 6 (Oct): 63279–63291. https://doi.org/10.1109/ACCESS.2018.2877269.
Pedregosa, F., et al. 2011. “Scikit-learn: Machine learning in python.” J. Mach. Learn. Res. 12 (2011): 2825–2830.
Rao, Z., and E. Salomons. 2007. “Development of a real-time, near-optimal control process for water-distribution networks.” J. Hydroinf. 9 (1): 25–37. https://doi.org/10.2166/hydro.2006.015.
Romano, M., and Z. Kapelan. 2014. “Adaptive water demand forecasting for near real-time management of smart water distribution systems.” Environ. Modell. Software 60 (Oct): 265–276. https://doi.org/10.1016/j.envsoft.2014.06.016.
Rubin, D. B. 1976. “Inference and missing data.” Biometrika 63 (3): 581–592. https://doi.org/10.1093/biomet/63.3.581.
Schmidhuber, J. 2015. “Deep learning in neural networks: An overview.” Neural Networks 61 (Jan): 85–117. https://doi.org/10.1016/j.neunet.2014.09.003.
Sharma, V., and K. Yuden. 2021. “Imputing missing data in hydrology using machine learning models.” Int. J. Eng. Res. Technol. 10 (2021): 78–82.
Souza, R. G., G. Meirelles, and B. Brentan. 2022. “Energy and hydraulic efficiency in intermittent operation of water distribution networks.” J. Water Resour. Plann. Manage. 148 (5): 04022017. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001552.
Stekhoven, D. J., and P. Bühlmann. 2012. “Missforest-non-parametric missing value imputation for mixed-type data.” Bioinformatics 28 (1): 112–118. https://doi.org/10.1093/bioinformatics/btr597.
Taormina, R., and S. Galelli. 2018. “Deep-learning approach to the detection and localization of cyber-physical attacks on water distribution systems.” J. Water Resour. Plann. Manage. 144 (10): 04018065. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000983.
Troyanskaya, O., M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman. 2001. “Missing value estimation methods for DNA microarrays.” Bioinformatics 17 (6): 520–525. https://doi.org/10.1093/bioinformatics/17.6.520.
Waqas Khan, P., Y.-C. Byun, S.-J. Lee, and N. Park. 2020. “Machine learning based hybrid system for imputation and efficient energy demand forecasting.” Energies 13 (11): 2681.
West, M. 1997. “Time series decomposition.” Biometrika 84 (2): 489–494. https://doi.org/10.1093/biomet/84.2.489.
Wu, Y., and S. Liu. 2017. “A review of data-driven approaches for burst detection in water distribution systems.” Urban Water J. 14 (9): 972–983. https://doi.org/10.1080/1573062X.2017.1279191.
Xenochristou, M., and Z. Kapelan. 2020. “An ensemble stacked model with bias correction for improved water demand forecasting.” Urban Water J. 17 (3): 212–223. https://doi.org/10.1080/1573062X.2020.1758164.
Xing, L., and L. Sela. 2022. “Graph neural networks for state estimation in water distribution systems: Application of supervised and semisupervised learning.” J. Water Resour. Plann. Manage. 148 (5): 04022018. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001550.
Zanfei, A., A. Menapace, F. Granata, R. Gargano, M. Frisinghelli, and M. Righetti. 2022. “An ensemble neural network model to forecast drinking water consumption.” J. Water Resour. Plann. Manage. 148 (5): 04022014. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001540.
Zanfei, A., A. Menapace, S. Santopietro, and M. Righetti. 2020. “Calibration procedure for water distribution systems: Comparison among hydraulic models.” Water 12 (5): 1421. https://doi.org/10.3390/w12051421.
Information & Authors
Information
Published In
Copyright
© 2022 American Society of Civil Engineers.
History
Received: Mar 1, 2022
Accepted: Jul 15, 2022
Published online: Sep 13, 2022
Published in print: Nov 1, 2022
Discussion open until: Feb 13, 2023
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.
Cited by
- Azar Niknam, Hasan Khademi Zare, Hassan Hosseininasab, Ali Mostafaeipour, A hybrid approach combining the multi-dimensional time series k-means algorithm and long short-term memory networks to predict the monthly water demand according to the uncertainty in the dataset, Earth Science Informatics, 10.1007/s12145-023-00976-y, (2023).
- Ariele Zanfei, Bruno Melo Brentan, Andrea Menapace, Maurizio Righetti, A short-term water demand forecasting model using multivariate long short-term memory with meteorological data, Journal of Hydroinformatics, 10.2166/hydro.2022.055, 24, 5, (1053-1065), (2022).