Technical Papers
Mar 31, 2022

Supervised Stacking Ensemble Machine Learning Approach for Enhancing Prediction of Total Suspended Solids Concentration in Urban Watersheds

Publication: Journal of Environmental Engineering
Volume 148, Issue 6

Abstract

The potential for stacking ensemble modeling to enhance the performance and generalizability of machine learning (ML) models for the estimation of total suspended solids (TSS) concentration was assessed by comparing the results with ensemble boosting, bagging, and single ML models. Seven stacking ensemble models (M1 to M7) were created using combinations of basic learners, including single, bagging, and boosting models. Adaptive Boosting (AdB) was used as an aggregation method in M1 to M6. The six models showed coefficient of determination (R2) values ranging from 0.87 to 0.95, root mean square error (RMSE) values ranging from 50 to 90  mg/L, and mean absolute error (MAE) values ranging from 11 to 86  mg/L where the best R2, RMSE, and MAE values were 0.95, 50  mg/L, and 12  mg/L, respectively. To further improve the predictions, we tested aggregation methods, including AdB, Random Forest (RF), Variable Weighting kNN (VW-kNN), Regression Tree (RT), and Support Vector Regression (SVR) using the structure of the highest-performing M6 model. This led to a new best fit model (M7) with RF as an aggregation method with R2, RMSE, and MAE values of 0.98, 32  mg/L, and 11  mg/L, respectively.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some or all data, models, or code generated or used during the study are available in a repository or online in accordance with funder data retention policies (https://bmpdatabase.org/national-stormwater-quality-database). A publicly accessible National Stormwater Quality Database (NSQD) was utilized to generate the input data set. This database is the result of an effort by the Environmental Protection Agency (EPA) to collect stormwater quality data in the United States (https://www.bmpdatabase.org/nsqd.html). An open-source python-based code, Orange (Version 3.24), was used (https://scikit-learn.org, https://orangedatamining.com).

Acknowledgments

This work was supported by the South Dakota Board of Regents Competitive Research Grant (CRG) and Civil and Environmental Engineering, South Dakota School of Mines and Technology.

References

Ahmed, A. N., F. B. Othman, H. A. Afan, R. K. Ibrahim, C. M. Fai, M. S. Hossain, M. Ehteram, and A. Elshafie. 2019. “Machine learning methods for better water quality prediction.” J. Hydrol. 578 (Nov): 124084. https://doi.org/10.1016/j.jhydrol.2019.124084.
Al Hasan, M., V. Chaoji, S. Salem, and M. Zaki. 2006. “Link prediction using supervised learning.” In Vol. 30 of Proc., SDM06: Workshop on Link Analysis, Counter-Terrorism and Security, 798–805. Philadelphia: Society for Industrial and Applied Mathematics.
Al-Stouhi, S., and C. K. Reddy. 2011. “Adaptive boosting for transfer learning using dynamic updates.” In Proc., Joint European Conf. on Machine Learning and Knowledge Discovery in Databases, 60–75. Berlin: Springer. https://doi.org/10.1007/978-3-642-23780-5_14.
Azizi, K., J. Attari, and A. Moridi. 2017. “Estimation of discharge coefficient and optimization of Piano key weirs.” In Proc., 3rd Int. Workshop on Labyrinth and Piano Key Weirs (PKW 2017): Labyrinth and Piano Key Weirs III, 213. Liège, Belgium: Univ. of Liège.
Babbar, R., and S. Babbar. 2017. “Predicting river water quality index using data mining techniques.” Environ. Earth Sci. 76 (14): 504. https://doi.org/10.1007/s12665-017-6845-9.
Bachhuber, J. A., and K. Mattfield. 2009. “Quantifying urban stormwater pollutant loads and management costs within the lower fox river Basin.” In Proc., Water Environment Federation, 600–605. Alexandria, VA: Water Environment Federation.
Barnes, K. B., J. Morgan, and M. Roberge. 2001. Impervious surfaces and the quality of natural and built environments. Baltimore, MD: Dept. of Geography and Environmental Planning, Towson Univ.
Breiman, L. 1996. “Bagging predictors.” Mach. Learn. 24 (2): 123–140. https://doi.org/10.1007/BF00058655.
Breiman, L. 2001. “Random forests.” Mach. Learn. 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.
Cassotti, M., D. Ballabio, V. Consonni, A. Mauri, I. V. Tetko, and R. Todeschini. 2014. “Prediction of acute aquatic toxicity toward daphnia magna by using the GA-kNN method.” Alter. Lab. Anim. 42 (1): 31–41. https://doi.org/10.1177/026119291404200106.
Dhini, A., I. Surjandari, B. Kusumoputro, and A. Kusiak. 2021. “Extreme learning machine–radial basis function (ELM-RBF) networks for diagnosing faults in a steam turbine.” J. Ind. Prod. Eng. 1–9. https://doi.org/10.1080/21681015.2021.1887948.
Diette Dua, D., and C. Graff. 2019. UCI machine learning repository. Irvine, CA: Univ. of California.
Dietterich, T. G. 2002. “Ensemble learning.” In Vol. 2 of The handbook of brain theory and neural networks, 110–125. Cambridge, MA: MIT Press.
Freund, Y., R. Schapire, and N. Abe. 1999. “A short introduction to boosting.” J. Japan Soc. Artif. Intell. 14 (771–780): 1612.
Golecha, Y. S. 2017. “Analyzing term deposits in banking sector by performing predictive analysis using multiple machine learning techniques.” Ph.D. dissertation, School of Computing, National College of Ireland.
Gong, Y., X. Liang, X. Li, J. Li, X. Fang, and R. Song. 2016. “Influence of rainfall characteristics on total suspended solids in urban runoff: A case study in Beijing, China.” Water 8 (7): 278. https://doi.org/10.3390/w8070278.
Granata, F., S. Papirio, G. Esposito, R. Gargano, and G. De Marinis. 2017. “Machine learning algorithms for the forecasting of wastewater quality indicators.” Water 9 (2): 105. https://doi.org/10.3390/w9020105.
Hardt, M., E. Price, and N. Srebro. 2016. “Equality of opportunity in supervised learning.” In Vol. 29 of Advances in neural information processing systems, 3315–3323. San Diego: Neural Information Processing Systems.
Hasanipanah, M., R. S. Faradonbeh, H. B. Amnieh, D. J. Armaghani, and M. Monjezi. 2017. “Forecasting blast-induced ground vibration developing a CART model.” Eng. Comput. 33 (2): 307–316. https://doi.org/10.1007/s00366-016-0475-9.
Healey, S. P., et al. 2018. “Mapping forest change using stacked generalization: An ensemble approach.” Remote Sens. Environ. 204 (Jan): 717–728. https://doi.org/10.1016/j.rse.2017.09.029.
Jeung, M., S. Baek, J. Beom, K. H. Cho, Y. Her, and K. Yoon. 2019. “Evaluation of random forest and regression tree methods for estimation of mass first flush ratio in urban catchments.” J. Hydrol. 575 (Aug): 1099–1110. https://doi.org/10.1016/j.jhydrol.2019.05.079.
Khairalla, M. A., X. Ning, N. T. Al-Jallad, and M. O. El-Faroug. 2018. “Short-term forecasting for energy consumption through stacking heterogeneous ensemble learning model.” Energies 11 (6): 1605. https://doi.org/10.3390/en11061605.
Khan, M. M. R., R. B. Arif, M. A. B. Siddique, and M. R. Oishe. 2018. “Study and observation of the variation of accuracies of KNN, SVM, LMNN, ENN algorithms on eleven different datasets from UCI machine learning repository.” In Proc., 2018 4th Int. Conf. on Electrical Engineering and Information & Communication Technology (iCEEiCT), 124–129. New York: IEEE. https://doi.org/10.1109/CEEICT.2018.8628041.
Kim, Y. H., J. Im, H. K. Ha, J. K. Choi, and S. Ha. 2014. “Machine learning approaches to coastal water quality monitoring using GOCI satellite data.” GIScience Remote Sens. 51 (2): 158–174. https://doi.org/10.1080/15481603.2014.900983.
King, J. K., and J. O. Blanton. 2011. “Model for predicting effects of land-use changes on the canal-mediated discharge of total suspended solids into tidal creeks and estuaries.” J. Environ. Eng. 137 (10): 920–927. https://doi.org/10.1061/(ASCE)EE.1943-7870.0000396.
Kohli, S., G. T. Godwin, and S. Urolagin. 2021. “Sales prediction using linear and KNN regression.” In Advances in machine learning and computational intelligence, 321–329. Singapore: Springer.
Kolluri, J., V. K. Kotte, M. S. B. Phridviraj, and S. Razia. 2020. “Reducing overfitting problem in machine learning using novel L1/4 regularization method.” In Proc., 2020 4th Int. Conf. on Trends in Electronics and Informatics (ICOEI) (48184), 934–938. New York: IEEE. https://doi.org/10.1109/ICOEI48184.2020.9142992.
Larose, D. T., and C. D. Larose. 2014. Vol. 4 of Discovering knowledge in data: An introduction to data mining. Hoboken, NJ: Wiley.
Lasisi, A., and N. Attoh-Okine. 2019. “Machine learning ensembles and rail defects prediction: Multilayer stacking methodology.” ASCE-ASME J. Risk Uncertainty Eng. Syst. Part A: Civ. Eng. 5 (4): 04019016. https://doi.org/10.1061/AJRUA6.0001024.
Liang, J., W. Li, S. A. Bradford, and J. Šimůnek. 2019. “Physics-informed data-driven models to predict surface runoff water quantity and quality in agricultural fields.” Water 11 (2): 200. https://doi.org/10.3390/w11020200.
Liaw, A., and M. Wiener. 2002. “Classification and regression by random forest.” R News 2 (3): 18–22.
Ließ, M., B. Glaser, and B. Huwe. 2012. “Uncertainty in the spatial prediction of soil texture: Comparison of regression tree and Random Forest models.” Geoderma 170 (Jan): 70–79. https://doi.org/10.1016/j.geoderma.2011.10.010.
Liu, S., H. Tai, Q. Ding, D. Li, L. Xu, and Y. Wei. 2013. “A hybrid approach of support vector regression with genetic algorithm optimization for aquaculture water quality prediction.” Math. Comput. Modell. 58 (3–4): 458–465. https://doi.org/10.1016/j.mcm.2011.11.021.
Maestre, A., and R. E. Pitt. 2006. “Identification of significant factors affecting stormwater quality using the national stormwater quality database.” J. Water Manage. Model. 14: 287–325. https://doi.org/10.14796/JWMM.R225-13.
Maestre, A., R. E. Pitt, and D. Williamson. 2004. “Nonparametric statistical tests comparing first flush and composite samples from the national stormwater quality database.” J. Water Manage. Model. 12: 317–338. https://doi.org/10.14796/JWMM.R220-15.
Marill, K. A. 2004. “Advanced statistics: Linear regression, part II: Multiple linear regression.” Acad. Emergency Med. 11 (1): 94–102. https://doi.org/10.1111/j.1553-2712.2004.tb01379.x.
McCarthy, D. T., J. M. Hathaway, W. F. Hunt, and A. Deletic. 2012. “Intra-event variability of Escherichia coli and total suspended solids in urban stormwater runoff.” Water Res. 46 (20): 6661–6670. https://doi.org/10.1016/j.watres.2012.01.006.
Moeini, M., and M. Geza. 2020. “Machine learning techniques for estimation of suspended sediment loading in urban watersheds.” In Proc., AGU Fall Meeting 2020. Washington, DC: American Geophysical Union.
Moeini, M., A. Shojaeizadeh, and M. Geza. 2021. “Supervised machine learning for estimation of total suspended solids in urban watersheds.” Water 13 (2): 147. https://doi.org/10.3390/w13020147.
Moeini, M., and B. Zahraie. 2018. “Monthly water balance modeling by linking hydro-climatologic and tank groundwater balance models.” Iran-Water Resour. Res. 14 (3): 71–84.
Najah, A., A. El-Shafie, O. A. Karim, and A. H. El-Shafie. 2013. “Application of artificial neural networks for water quality prediction.” Neural Comput. Appl. 22 (S1): 187–201. https://doi.org/10.1007/s00521-012-0940-3.
Palani, S., S. Y. Liong, and P. Tkalich. 2008. “An ANN application for water quality forecasting.” Mar. Pollut. Bull. 56 (9): 1586–1597. https://doi.org/10.1016/j.marpolbul.2008.05.021.
Partalas, I., G. Tsoumakas, E. V. Hatzikos, and I. Vlahavas. 2008. “Greedy regression ensemble selection: Theory and an application to water quality prediction.” Inf. Sci. 178 (20): 3867–3879. https://doi.org/10.1016/j.ins.2008.05.025.
Pavlyshenko, B., 2018. Using stacking approaches for machine learning models.” In Proc., 2018 IEEE 2nd Int. Conf. on Data Stream Mining & Processing (DSMP), 255–258. New York: IEEE. https://doi.org/10.1109/DSMP.2018.8478522.
Pernía-Espinoza, A., J. Fernández-Ceniceros, J. Antonanzas, R. Urraca, and F. J. Martinez-de-Pison. 2018. “Stacking ensemble with parsimonious base models to improve generalization capability in the characterization of steel bolted components.” Appl. Soft Comput. 70 (Sep): 737–750. https://doi.org/10.1016/j.asoc.2018.06.005.
Pitt, R. 2008. “Calibration of WinSLAMM.” Accessed March 13, 2022. http://www.winslamm.com/docs/WinSLAMM%20calibration%20Sept%2024%202008.pdf.
Pitt, R. 2012. WinSLAMM: Integrating stormwater management and green technologies. New York: Engineering Conferences International.
Pitt, R., A. Maestre, and R. Morquecho. 2004. “The national stormwater quality database (NSQD, version 1.1).” In Proc., 1st Annual Stormwater Management Research Symp., 13–51. Tuscaloosa, AL: Univ. of Alabama and the Center for Watershed Protection.
Pitt, R., and J. Voorhees. 2002. “SLAMM, the source loading and management model.” In Wet-weather flow in the urban watershed: Technology and management, 103–139. Boca Raton, FL: CRC Press. https://doi.org/10.1201/9781420012774.
Pizarro, J., P. M. Vergara, J. L. Morales, J. A. Rodríguez, and I. Vila. 2014. “Influence of land use and climate on the load of suspended solids in catchments of Andean rivers.” Environ. Monit. Assess. 186 (2): 835–843. https://doi.org/10.1007/s10661-013-3420-z.
Rajadurai, H., and U. D. Gandhi. 2020. “A stacked ensemble learning model for intrusion detection in wireless network.” Neural Comput. Appl. 1–9. https://doi.org/10.1007/s00521-020-04986-5.
Ribeiro, V. H. A., and G. Reynoso-Meza. 2018. “Multi-objective support vector machines ensemble generation for water quality monitoring.” In Proc., 2018 IEEE Congress on Evolutionary Computation (CEC), 1–6. New York: IEEE. https://doi.org/10.1109/CEC.2018.8477745.
Rojas, R. 2009. AdaBoost and the super bowl of classifiers a tutorial introduction to adaptive boosting. Berlin: Freie Univ.
Rossman, L. A., R. E. Dickinson, T. Schade, C. C. Chan, E. Burgess, D. Sullivan, and F. H. Lai. 2004. “SWMM 5-the next generation of EPA’s storm water management model.” J. Water Manage. Model. 12: 339–358. https://doi.org/10.1016/j.psep.2020.04.045.https://doi.org/10.14796/JWMM.R220-16.
Schapire, R. E. 1999. “Theoretical views of boosting and applications.” In Proc., Int. Conf. on Algorithmic Learning Theory, 13–25. Berlin: Springer.
Shakya, S., K. A. Tamaddun, H. Stephen, and S. Ahmad. 2019. “Urban runoff and pollutant reduction by retrofitting green infrastructure in storm water management system.” In Proc., World Environmental and Water Resources Congress 2019: Water, Wastewater, and Stormwater; Urban Water Resources; and Municipal Water Infrastructure, 93–104. Reston, VA: ASCE. https://doi.org/10.1061/9780784482360.010.
Sharafati, A., S. B. H. S. Asadollah, and M. Hosseinzadeh. 2020. “The potential of new ensemble machine learning models for effluent quality parameters prediction and related uncertainty.” Process Saf. Environ. Prot. 140 (Aug): 68–78. https://doi.org/10.1016/j.psep.2020.04.045.
Shojaeizadeh, A., M. Geza, J. McCray, and T. S. Hogue. 2019. “Site-scale integrated decision support tool (i-DSTss) for stormwater management.” Water 11 (10): 2022. https://doi.org/10.3390/w11102022.
Singh, K. P., A. Basant, A. Malik, and G. Jain. 2009. “Artificial neural network modeling of the river water quality—A case study.” Ecol. Modell. 220 (6): 888–895. https://doi.org/10.1016/j.ecolmodel.2009.01.004.
Singh, K. P., N. Basant, and S. Gupta. 2011. “Support vector machines in water quality management.” Anal. Chim. Acta 703 (2): 152–162. https://doi.org/10.1016/j.aca.2011.07.027.
Springenberg, J. T. 2015. “Unsupervised and semi-supervised learning with categorical generative adversarial networks.” Preprint, submitted November 19, 2015. http://arxiv.org/abs/1511.06390.
Sutton, C. D. 2005. “Classification and regression trees, bagging, and boosting.” In Vol. 24 of Handbook of statistics, 303–329. Amsterdam, Netherlands: Elseiver. https://doi.org/10.1016/S0169-7161(04)24011-1.
Tan, M., and Q. V. Le. 2019. “Efficientnet: Rethinking model scaling for convolutional neural networks.” Preprint, submitted May 28, 2019. http://arxiv.org/abs/1905.11946.
Tan, P. N., M. Steinbach, and V. Kumar. 2016. Introduction to data mining. New York: Pearson Education India.
Tan, S. 2005. “Neighbor-weighted k-nearest neighbor for unbalanced text corpus.” Expert Syst. Appl. 28 (4): 667–671. https://doi.org/10.1016/j.eswa.2004.12.023.
Tang, Z., Y. Li, and A. Kusiak. 2020. “A deep learning model for measuring oxygen content of boiler flue gas.” IEEE Access 8: 12268–12278. https://doi.org/10.1109/ACCESS.2020.2965199.
Uygun, B. Ş., and M. Albek. 2015. “Determination effects of impervious areas on urban watershed.” Environ. Sci. Pollut. Res. 22 (3): 2272–2286. https://doi.org/10.1007/s11356-014-3345-2.
Wang, L., Z. Zhu, L. Sassoubre, G. Yu, C. Liao, Q. Hu, and Y. Wang. 2020. “Improving the robustness of beach water quality modeling using an ensemble machine learning approach.” Sci. Total Environ. 765 (Apr): 142760. https://doi.org/10.1016/j.scitotenv.2020.142760.
Wang, X., L. Ma, and X. Wang. 2010. “Apply semi-supervised support vector regression for remote sensing water quality retrieving.” In Proc., 2010 IEEE Int. Geoscience and Remote Sensing Symp., 2757–2760. New York: IEEE. https://doi.org/10.1109/IGARSS.2010.5653832.
Willmott, C. J., and K. Matsuura. 2005. “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance.” Climate Res. 30 (1): 79–82. https://doi.org/10.3354/cr030079.
Wu, D., H. Wang, and R. Seidu. 2020. “Smart data-driven quality prediction for urban water source management.” Future Gener. Comput. Syst. 107 (Jun): 418–432. https://doi.org/10.1016/j.future.2020.02.022.
Xia, Y., K. Chen, and Y. Yang. 2021. “Multi-label classification with weighted classifier selection and stacked ensemble.” Inf. Sci. 557 (May): 421–442. https://doi.org/10.1016/j.ins.2020.06.017.
Xu, M., P. Watanachaturaporn, P. K. Varshney, and M. K. Arora. 2005. “Decision tree regression for soft classification of remote sensing data.” Remote Sens. Environ. 97 (3): 322–336. https://doi.org/10.1016/j.rse.2005.05.008.
Young, B. N., J. M. Hathaway, W. A. Lisenbee, and Q. He. 2018. “Assessing the runoff reduction potential of highway swales and WinSLAMM as a predictive tool.” Sustainability 10 (8): 2871. https://doi.org/10.3390/su10082871.
Zhou, Z. H. 2009. “Ensemble learning.” In Encyclopedia of biometrics, 270–273. Berlin: Springer. https://doi.org/10.1007/978-0-387-73003-5_293.

Information & Authors

Information

Published In

Go to Journal of Environmental Engineering
Journal of Environmental Engineering
Volume 148Issue 6June 2022

History

Received: May 5, 2021
Accepted: Jan 15, 2022
Published online: Mar 31, 2022
Published in print: Jun 1, 2022
Discussion open until: Aug 31, 2022

Permissions

Request permissions for this article.

Authors

Affiliations

Mohammadreza Moeini, S.M.ASCE [email protected]
Master’s Student, Dept. of Civil and Environmental Engineering, Dept. of Civil and Environmental Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701. Email: [email protected]
Ali Shojaeizadeh, A.M.ASCE [email protected]
Research Scientist, Dept. of Civil and Environmental Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701. Email: [email protected]
Mengistu Geza, A.M.ASCE [email protected]
Assistant Professor, Dept. of Civil and Environmental Engineering, South Dakota School of Mines and Technology, Rapid City, SD 57701 (corresponding author). Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share