Two Strategies for Avoiding Overfitting Long-Term Forecasting Models: Downsampling Predictor Fields and Shrinking Coefficients
Publication: Journal of Hydrologic Engineering
Volume 28, Issue 4
Abstract
Long-term hydrological forecasting based on sea surface temperature (SST) fields faces the large and small problem, i.e., too many potential predictors and a limited number of samples. Considering the selection of predictors will also enhance the complexity of models and lead to overfitting, in this study, two strategies are used for building forecasting models for long-term streamflow forecasting. The first strategy is to downsample the SST field and optimize its spatial resolution; the second is to shrink model coefficients based on L1 regularization. We build models based on the downsampled SST fields with different spatial resolutions. It is found that the model based on a proper spatial resolution always performs better than the model based on the raw SST field. This result suggests that it is better to treat the spatial resolution of the predictor field as a hyperparameter, which is similar to hyperparameters for controlling the complexity of many machine learning models. For applying the second strategy, L1 norm regularization models, including (1) least absolute selection and shrinkage operator (LASSO), (2) relaxed LASSO, and (3) two-step approach of LASSO and ordinary least squares regression () are explored. We have found that the relaxed LASSO model always performs better than the ordinary LASSO, indicating that relaxed LASSO is a better shrinkage approach. Furthermore, the skills of the presented models are compared with the stepwise regression, and the lower skill of the stepwise regression based on SST fields with a high spatial resolution suggests that one should not select predictors based on fields with high resolutions considering the limited number of samples.
Get full access to this article
View all available purchase options and get full access to this article.
Data Availability Statement
Some or all data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.
Acknowledgments
This study is supported by the National Key Research and Development Program of China (Nos. 2016YFC0402709 and 2021YFC3000102), the Basic research Funds for central public research institutes (No. Y521007), and the Key Projects of Natural Science Research in Universities of Anhui Province (No. KJ2020A0745).
References
Delsole, T., and J. Shukla. 2009. “Artificial skill due to predictor screening.” J. Clim. 22 (2): 331–345. https://doi.org/10.1175/2008JCLI2414.1.
DelSole, T., and A. Banerjee. 2017. “Statistical seasonal prediction based on regularized regression.” J. Clim. 30 (4): 1345–1361. https://doi.org/10.1175/JCLI-D-16-0249.1.
DiCiccio, T. J., and B. Efron. 1996. “Bootstrap confidence intervals.” Stat. Sci. 11 (3): 189–212. https://doi.org/10.1214/ss/1032280214.
Efron, B. 1979. “Bootstrap methods: Another look at the jackknife.” In Breakthroughs in statistics, 565–568. New York, NY: Springer.
Elsner, J. B., and C. P. Schmertmann. 1994. “Assessing forecast skill through cross validation.” Weather Forecasting 9 (4): 619–624. https://doi.org/10.1175/1520-0434(1994)009%3C0619:AFSTCV%3E2.0.CO;2.
Erdal, H. I., and O. Karakurt. 2013. “Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms.” J. Hydrol. 477 (Jan): 119–128. https://doi.org/10.1016/j.jhydrol.2012.11.015.
Fan, L. 2019. “Extracting robust predictors from a factor field: An empirically optimal screening method.” Geophys. Res. Lett. 46 (14): 8355–8362. https://doi.org/10.1029/2019GL083481.
Flynn, C. J., C. M. Hurvich, and J. S. Simonoff. 2014. “On the sensitivity of the lasso to the number of predictor variables.” Statistics 18 (18): 177–202.
Friedman, J., T. Hastie, and R. Tibshirani. 2010. “Regularization paths for generalized linear models via coordinate descent.” J. Stat. Software 33 (1): 1. https://doi.org/10.18637/jss.v033.i01.
Gong, D. Y., and C. H. Ho. 2002. “Shift in the summer rainfall over the Yangtze River valley in the late 1970s.” Geophys. Res. Lett. 29 (10): 1–4. https://doi.org/10.1029/2001GL014523.
Hastie, T., R. Tibshirani, and J. Friedman. 2009. Vol. 2 of The elements of statistical learning. New York: Springer-Verlag.
Hastie, T., R. Tibshirani, and R. J. Tibshirani. 2020. “Best subset, forward stepwise, or Lasso? Analysis and recommendations based on extensive comparisons.” Stat. Sci. 35 (4): 579–592.
He, R. R., Y. Chen, Q. Huang, Z. W. Pan, and Y. Liu. 2020. “Predictability of monthly streamflow time series and its relationship with basin characteristics: An empirical study based on the MOPEX basins.” Water Resour. Manage. 34 (15): 4991–5007. https://doi.org/10.1007/s11269-020-02708-z.
Huang, R., and Y. Wu. 1989. “The influence of ENSO on the summer climate change in China and its mechanism.” Adv. Atmos. Sci. 6 (1): 21–32. https://doi.org/10.1007/BF02656915.
Karran, D. J., E. Morin, and J. Adamowski. 2014. “Multi-step streamflow forecasting using data-driven non-linear methods in contrasting climate regimes.” J. Hydroinf. 16 (3): 671–689. https://doi.org/10.2166/hydro.2013.042.
Kuhn, M. 2008. “Building predictive models in R using the caret package.” J. Stat. Software 28 (5): 1–26. https://doi.org/10.18637/jss.v028.i05.
Mazumder, R., P. Radchenko, and A. Dedieu. 2017. “Subset selection with shrinkage: Sparse linear modeling when the SNR is low.” Preprint, submitted August 10, 2017. https://arxiv.org/abs/1708.03288.
Meinshausen, N. 2007. “Relaxed lasso.” Comput. Stat. Data Anal. 52 (1): 374–393. https://doi.org/10.1016/j.csda.2006.12.019.
Michaelsen, J. 1987. “Cross-validation in statistical climate forecast models.” J. Appl. Meteorol. 26 (11): 1589–1600. https://doi.org/10.1175/1520-0450(1987)026%3C1589:CVISCF%3E2.0.CO;2.
Ouyang, R., W. Liu, G. Fu, C. Liu, L. Hu, and H. Wang. 2014. “Linkages between ENSO/PDO signals and precipitation, streamflow in China during the last 100 years.” Hydrol. Earth Syst. Sci. 18 (9): 3651–3661. https://doi.org/10.5194/hess-18-3651-2014.
Peng, Z., Q. Wang, J. C. Bennett, P. Pokhrel, and Z. Wang. 2014. “Seasonal precipitation forecasts over China using monthly large-scale oceanic-atmospheric indices.” J. Hydrol. 519 (Nov): 792–802. https://doi.org/10.1016/j.jhydrol.2014.08.012.
Ren, H., and F. Jin. 2013. “Recharge oscillator mechanisms in two types of ENSO.” J. Clim. 26 (17): 6506–6523. https://doi.org/10.1175/JCLI-D-12-00601.1.
Sheng-Ping, H. 2015. “Potential connection between the Australian summer monsoon circulation and summer precipitation over central China.” Atmos. Oceanic Sci. Lett. 8 (3): 120–126. https://doi.org/10.1080/16742834.2015.11447248.
Smith, T. M., and R. W. Reynolds. 2004. “Improved Extended Reconstruction of SST (1854–1997).” J. Clim. 17 (12): 2466–2477. https://doi.org/10.1175/1520-0442(2004)017%3C2466:IEROS%3E2.0.CO;2.
Solomatine, D. P., and A. Ostfeld. 2008. “Data-driven modelling: Some past experiences and new approaches.” J. Hydroinf. 10 (1): 3–22. https://doi.org/10.2166/hydro.2008.015.
Stone, M. 1974. “Cross-validatory choice and assessment of statistical predictions.” J. R. Stat. Soc. 36 (2): 111–147.
Tibshirani, R. 2011. “Regression shrinkage and selection via the lasso: A retrospective.” J. R. Stat. Soc. B 73 (3): 273–282. https://doi.org/10.1111/j.1467-9868.2011.00771.x.
Tibshirani, R. J. 2015. “Degrees of freedom and model search.” Stat. Sin. 25 (Jul): 1265–1296.
Tibshirani, R. J., and J. Taylor. 2012. “Degrees of freedom in lasso problems.” Ann. Stat. 40 (2): 1198–1232. https://doi.org/10.1214/12-AOS1003.
Wei, W., Y. Chang, and Z. Dai. 2014. “Streamflow changes of the Changjiang (Yangtze) River in the recent 60 years: Impacts of the East Asian summer monsoon, ENSO, and human activities.” Quat. Int. 336 (Jun): 98–107. https://doi.org/10.1016/j.quaint.2013.10.064.
Wu, C., and K.-W. Chau. 2010. “Data-driven models for monthly streamflow time series prediction.” Eng. Appl. Artif. Intell. 23 (8): 1350–1367. https://doi.org/10.1016/j.engappai.2010.04.003.
Wu, R., and B. Wang. 2002. “A contrast of the East Asian Summer Monsoon-ENSO relationship between 1962-77 and 1978-93.” J. Clim. 15 (22): 3266–3279. https://doi.org/10.1175/1520-0442(2002)015%3C3266:ACOTEA%3E2.0.CO;2.
Wu, Z., B. Wang, J. Li, and F. F. Jin. 2009. “An empirical seasonal prediction model of the East Asian summer monsoon using ENSO and NAO.” J. Geophys. Res. Atmos. 114 (D18): D18120. https://doi.org/10.1029/2009JD011733.
Zhang, Q., C.-Y. Xu, T. Jiang, and Y. Wu. 2007. “Possible influence of ENSO on annual maximum streamflow of the Yangtze River, China.” J. Hydrol. 333 (2): 265–274. https://doi.org/10.1016/j.jhydrol.2006.08.010.
Zhou, B. 2011. “Linkage between winter sea surface temperature east of Australia and summer precipitation in the Yangtze River valley and a possible physical mechanism.” Chin. Sci. Bull. 56 (17): 1821–1827. https://doi.org/10.1007/s11434-011-4497-9.
Zhou, T., et al. 2009. “Why the western Pacific subtropical high has extended westward since the late 1970s.” J. Clim. 22 (8): 2199–2215. https://doi.org/10.1175/2008JCLI2527.1.
Information & Authors
Information
Published In
Copyright
© 2023 American Society of Civil Engineers.
History
Received: Jul 2, 2022
Accepted: Dec 14, 2022
Published online: Feb 9, 2023
Published in print: Apr 1, 2023
Discussion open until: Jul 9, 2023
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.