Correcting Systematic Underprediction of Biochemical Oxygen Demand in Support Vector Regression
This article has been corrected.
VIEW CORRECTIONPublication: Journal of Environmental Engineering
Volume 143, Issue 9
Abstract
Biochemical oxygen demand (BOD) is a variable that is missing or inaccurate in many water quality data sets because of difficulties in diluting highly polluted water samples. Machine learning algorithms, particularly support vector regression (SVR), are useful to build regression models to fill gaps in these data sets. The SVR can underpredict extreme-high values when they are few in number and underrepresented. This paper evaluates two methods, bootstrapping and data expansion, to mitigate the problem by increasing the proportion of extreme-high BOD in the data set before training the gap-filling model. Both methods were tested on the water quality data of Yuen Long Creek, Hong Kong, for the years 2000–2014. Both methods were effective in mitigating systematic underprediction and reducing their residual errors when the proportion of extreme-high values in the data set were increased from 3 to 30–40%. Both methods were useful for gap filling on BOD time series because extreme-high values are often the ones missing or inaccurate when highly polluted samples are diluted.
Get full access to this article
View all available purchase options and get full access to this article.
References
Balfer, J., and Bajorath, J. (2015). “Systematic artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis.” PLoS One, 10(3), e0119301.
Chiang, C. F., Wu, Y. S., and Young, J. C. (2004). “Analyzing the uncorrected error of dilution water demand for the dilution biochemical oxygen demand method.” Water Environ. Res., 76(3), 238–244.
Džeroski, S., Demšar, D., and Grbović, J. (2000). “Predicting chemical parameters of river water quality from bioindicator data.” Appl. Intell., 13(1), 7–17.
EPDHK (Environmental Protection Department of Hong Kong). (2007). “Livestock waste information system.” ⟨http://www.epd.gov.hk/epd/misc/river_quality/1986-2005/eng/5_nor_nt_-menu.htm⟩ (Jan. 21, 2017).
EPDHK (Environmental Protection Department of Hong Kong). (2014). “River water quality in Hong Kong in 2014.” ⟨http://wqrc.epd.gov.hk/pdf/water-quality/annual-report/RiverReport2014eng.pdf⟩ (Jan. 21, 2017).
Garsole, P., and Rajurkar, M. (2015). “Streamflow forecasting by using support vector regression.” Proc., 20th Int. Conf. of Hydraulics, Water Resources and River Engineering, Indian Society for Hydraulics, Pune, India.
Granata, F., Gargano, R., and de Marinis, G. (2016). “Support vector regression for rainfall-runoff modeling in urban drainage: A comparison with the EPA’s storm water management model.” Water, 8(3), 69.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning, Springer, New York.
Hsu, C.-W., Chang, C.-C., and Lin, C.-J. (2003). “A practical guide to support vector classification.” ⟨http://www.csie.ntu.edu.tw/∼cjlin/papers/guide/guide.pdf⟩ (Jan. 212017).
Karush, W. (1939). “Minima of functions of several variables with inequalities as side constraints.” M.S. thesis, Dept. of Mathematics, Univ. of Chicago, Chicago.
Kohavi, R. (1995). “A study of cross-validation and bootstrap for accuracy estimation and model selection.” Proc., Int. Joint Conf. of Artificial Intelligence, Morgan Kaufmann Publishers, San Francisco, 1137–1145.
Kuhn, H. W., and Tucker, A. W. (2014). “Nonlinear programming.” Traces and emergence of nonlinear programming, Springer, New York, 247–258.
Lima, A. R., Cannon, A. J., and Hsieh, W. W. (2015). “Nonlinear regression in environmental sciences using extreme learning machines: A comparative evaluation.” Environ. Modell. Software, 73, 175–188.
Liu, M., and Lu, J. (2014). “Support vector machine—An alternative to artificial neuron network for water quality forecasting in an agricultural nonpoint source polluted river?” Environ. Sci. Pollut. Res., 21(18), 11036–11053.
Nagel, B., Dellweg, H., and Gierasch, L. M. (1992). “Glossary for chemists of terms used in biotechnology (IUPAC recommendations 1992).” Pure Appl. Chem., 64(1), 143–168.
Nash, J., and Sutcliffe, J. (1970). “River flow forecasting through conceptual models. Part I: A discussion of principles.” J. Hydrol., 10(3), 282–290.
Noori, R., Karbassi, A., Ashrafi, K., Ardestani, M., Mehrdadi, N., and Bidhendi, G.-R. N. (2012). “Active and online prediction of BOD5 in river systems using reduced-order support vector machine.” Environ. Earth Sci., 67(1), 141–149.
Noori, R., Yeh, H.-D., Abbasi, M., Kachoosangi, F. T., and Moazami, S. (2015). “Uncertainty analysis of support vector machine for online prediction of five-day biochemical oxygen demand.” J. Hydrol., 527, 833–843.
Qiu, J.-W. (1999). “Composition, structure and distribution of polychaete assemblages in Deep Bay.” The mangrove ecosystem of deep bay and the Mai Po marshes, Hong Kong, Hong Kong University Press, Hong Kong, 13–21.
Rice, E., Baird, R., Eaton, A., and Clesceri, L. S. (2012). Standard methods for the examination of water and wastewater, American Public Health Association, American Water Works Association, Water Environment Federation, Washington, DC.
Sawyer, C. N., McCarty, P. L., and Parkin, G. F. (2002). Chemistry for environmental engineering and science, 5th Ed., McGraw Hill, New York.
Singh, K. P., Basant, A., Malik, A., and Jain, G. (2009). “Artificial neural network modeling of the river water quality—A case study.” Ecol. Modell., 220(6), 888–895.
Smola, A. J., and Scholkopf, B. (2004). “A tutorial on support vector regression.” Stat. Comput., 14(3), 199–222.
Udeigwe, T. K., and Wang, J. J. (2010). “Biochemical oxygen demand relationships in typical agricultural effluents.” Water Air Soil Pollut., 213(1–4), 237–249.
Vapnik, V. N. (1995). “Constructing learning algorithms.” The nature of statistical learning theory, Springer, New York, 119–166.
Information & Authors
Information
Published In
Copyright
©2017 American Society of Civil Engineers.
History
Received: Dec 7, 2016
Accepted: Feb 7, 2017
Published online: May 8, 2017
Published in print: Sep 1, 2017
Discussion open until: Oct 8, 2017
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.