Technical Papers
Feb 9, 2023

Two Strategies for Avoiding Overfitting Long-Term Forecasting Models: Downsampling Predictor Fields and Shrinking Coefficients

Publication: Journal of Hydrologic Engineering
Volume 28, Issue 4

Abstract

Long-term hydrological forecasting based on sea surface temperature (SST) fields faces the large p and small n problem, i.e., too many potential predictors and a limited number of samples. Considering the selection of predictors will also enhance the complexity of models and lead to overfitting, in this study, two strategies are used for building forecasting models for long-term streamflow forecasting. The first strategy is to downsample the SST field and optimize its spatial resolution; the second is to shrink model coefficients based on L1 regularization. We build models based on the downsampled SST fields with different spatial resolutions. It is found that the model based on a proper spatial resolution always performs better than the model based on the raw SST field. This result suggests that it is better to treat the spatial resolution of the predictor field as a hyperparameter, which is similar to hyperparameters for controlling the complexity of many machine learning models. For applying the second strategy, L1 norm regularization models, including (1) least absolute selection and shrinkage operator (LASSO), (2) relaxed LASSO, and (3) two-step approach of LASSO and ordinary least squares regression (LASSO+OLS) are explored. We have found that the relaxed LASSO model always performs better than the ordinary LASSO, indicating that relaxed LASSO is a better shrinkage approach. Furthermore, the skills of the presented models are compared with the stepwise regression, and the lower skill of the stepwise regression based on SST fields with a high spatial resolution suggests that one should not select predictors based on fields with high resolutions considering the limited number of samples.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some or all data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

This study is supported by the National Key Research and Development Program of China (Nos. 2016YFC0402709 and 2021YFC3000102), the Basic research Funds for central public research institutes (No. Y521007), and the Key Projects of Natural Science Research in Universities of Anhui Province (No. KJ2020A0745).

References

Delsole, T., and J. Shukla. 2009. “Artificial skill due to predictor screening.” J. Clim. 22 (2): 331–345. https://doi.org/10.1175/2008JCLI2414.1.
DelSole, T., and A. Banerjee. 2017. “Statistical seasonal prediction based on regularized regression.” J. Clim. 30 (4): 1345–1361. https://doi.org/10.1175/JCLI-D-16-0249.1.
DiCiccio, T. J., and B. Efron. 1996. “Bootstrap confidence intervals.” Stat. Sci. 11 (3): 189–212. https://doi.org/10.1214/ss/1032280214.
Efron, B. 1979. “Bootstrap methods: Another look at the jackknife.” In Breakthroughs in statistics, 565–568. New York, NY: Springer.
Elsner, J. B., and C. P. Schmertmann. 1994. “Assessing forecast skill through cross validation.” Weather Forecasting 9 (4): 619–624. https://doi.org/10.1175/1520-0434(1994)009%3C0619:AFSTCV%3E2.0.CO;2.
Erdal, H. I., and O. Karakurt. 2013. “Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms.” J. Hydrol. 477 (Jan): 119–128. https://doi.org/10.1016/j.jhydrol.2012.11.015.
Fan, L. 2019. “Extracting robust predictors from a factor field: An empirically optimal screening method.” Geophys. Res. Lett. 46 (14): 8355–8362. https://doi.org/10.1029/2019GL083481.
Flynn, C. J., C. M. Hurvich, and J. S. Simonoff. 2014. “On the sensitivity of the lasso to the number of predictor variables.” Statistics 18 (18): 177–202.
Friedman, J., T. Hastie, and R. Tibshirani. 2010. “Regularization paths for generalized linear models via coordinate descent.” J. Stat. Software 33 (1): 1. https://doi.org/10.18637/jss.v033.i01.
Gong, D. Y., and C. H. Ho. 2002. “Shift in the summer rainfall over the Yangtze River valley in the late 1970s.” Geophys. Res. Lett. 29 (10): 1–4. https://doi.org/10.1029/2001GL014523.
Hastie, T., R. Tibshirani, and J. Friedman. 2009. Vol. 2 of The elements of statistical learning. New York: Springer-Verlag.
Hastie, T., R. Tibshirani, and R. J. Tibshirani. 2020. “Best subset, forward stepwise, or Lasso? Analysis and recommendations based on extensive comparisons.” Stat. Sci. 35 (4): 579–592.
He, R. R., Y. Chen, Q. Huang, Z. W. Pan, and Y. Liu. 2020. “Predictability of monthly streamflow time series and its relationship with basin characteristics: An empirical study based on the MOPEX basins.” Water Resour. Manage. 34 (15): 4991–5007. https://doi.org/10.1007/s11269-020-02708-z.
Huang, R., and Y. Wu. 1989. “The influence of ENSO on the summer climate change in China and its mechanism.” Adv. Atmos. Sci. 6 (1): 21–32. https://doi.org/10.1007/BF02656915.
Karran, D. J., E. Morin, and J. Adamowski. 2014. “Multi-step streamflow forecasting using data-driven non-linear methods in contrasting climate regimes.” J. Hydroinf. 16 (3): 671–689. https://doi.org/10.2166/hydro.2013.042.
Kuhn, M. 2008. “Building predictive models in R using the caret package.” J. Stat. Software 28 (5): 1–26. https://doi.org/10.18637/jss.v028.i05.
Mazumder, R., P. Radchenko, and A. Dedieu. 2017. “Subset selection with shrinkage: Sparse linear modeling when the SNR is low.” Preprint, submitted August 10, 2017. https://arxiv.org/abs/1708.03288.
Meinshausen, N. 2007. “Relaxed lasso.” Comput. Stat. Data Anal. 52 (1): 374–393. https://doi.org/10.1016/j.csda.2006.12.019.
Michaelsen, J. 1987. “Cross-validation in statistical climate forecast models.” J. Appl. Meteorol. 26 (11): 1589–1600. https://doi.org/10.1175/1520-0450(1987)026%3C1589:CVISCF%3E2.0.CO;2.
Ouyang, R., W. Liu, G. Fu, C. Liu, L. Hu, and H. Wang. 2014. “Linkages between ENSO/PDO signals and precipitation, streamflow in China during the last 100 years.” Hydrol. Earth Syst. Sci. 18 (9): 3651–3661. https://doi.org/10.5194/hess-18-3651-2014.
Peng, Z., Q. Wang, J. C. Bennett, P. Pokhrel, and Z. Wang. 2014. “Seasonal precipitation forecasts over China using monthly large-scale oceanic-atmospheric indices.” J. Hydrol. 519 (Nov): 792–802. https://doi.org/10.1016/j.jhydrol.2014.08.012.
Ren, H., and F. Jin. 2013. “Recharge oscillator mechanisms in two types of ENSO.” J. Clim. 26 (17): 6506–6523. https://doi.org/10.1175/JCLI-D-12-00601.1.
Sheng-Ping, H. 2015. “Potential connection between the Australian summer monsoon circulation and summer precipitation over central China.” Atmos. Oceanic Sci. Lett. 8 (3): 120–126. https://doi.org/10.1080/16742834.2015.11447248.
Smith, T. M., and R. W. Reynolds. 2004. “Improved Extended Reconstruction of SST (1854–1997).” J. Clim. 17 (12): 2466–2477. https://doi.org/10.1175/1520-0442(2004)017%3C2466:IEROS%3E2.0.CO;2.
Solomatine, D. P., and A. Ostfeld. 2008. “Data-driven modelling: Some past experiences and new approaches.” J. Hydroinf. 10 (1): 3–22. https://doi.org/10.2166/hydro.2008.015.
Stone, M. 1974. “Cross-validatory choice and assessment of statistical predictions.” J. R. Stat. Soc. 36 (2): 111–147.
Tibshirani, R. 2011. “Regression shrinkage and selection via the lasso: A retrospective.” J. R. Stat. Soc. B 73 (3): 273–282. https://doi.org/10.1111/j.1467-9868.2011.00771.x.
Tibshirani, R. J. 2015. “Degrees of freedom and model search.” Stat. Sin. 25 (Jul): 1265–1296.
Tibshirani, R. J., and J. Taylor. 2012. “Degrees of freedom in lasso problems.” Ann. Stat. 40 (2): 1198–1232. https://doi.org/10.1214/12-AOS1003.
Wei, W., Y. Chang, and Z. Dai. 2014. “Streamflow changes of the Changjiang (Yangtze) River in the recent 60 years: Impacts of the East Asian summer monsoon, ENSO, and human activities.” Quat. Int. 336 (Jun): 98–107. https://doi.org/10.1016/j.quaint.2013.10.064.
Wu, C., and K.-W. Chau. 2010. “Data-driven models for monthly streamflow time series prediction.” Eng. Appl. Artif. Intell. 23 (8): 1350–1367. https://doi.org/10.1016/j.engappai.2010.04.003.
Wu, R., and B. Wang. 2002. “A contrast of the East Asian Summer Monsoon-ENSO relationship between 1962-77 and 1978-93.” J. Clim. 15 (22): 3266–3279. https://doi.org/10.1175/1520-0442(2002)015%3C3266:ACOTEA%3E2.0.CO;2.
Wu, Z., B. Wang, J. Li, and F. F. Jin. 2009. “An empirical seasonal prediction model of the East Asian summer monsoon using ENSO and NAO.” J. Geophys. Res. Atmos. 114 (D18): D18120. https://doi.org/10.1029/2009JD011733.
Zhang, Q., C.-Y. Xu, T. Jiang, and Y. Wu. 2007. “Possible influence of ENSO on annual maximum streamflow of the Yangtze River, China.” J. Hydrol. 333 (2): 265–274. https://doi.org/10.1016/j.jhydrol.2006.08.010.
Zhou, B. 2011. “Linkage between winter sea surface temperature east of Australia and summer precipitation in the Yangtze River valley and a possible physical mechanism.” Chin. Sci. Bull. 56 (17): 1821–1827. https://doi.org/10.1007/s11434-011-4497-9.
Zhou, T., et al. 2009. “Why the western Pacific subtropical high has extended westward since the late 1970s.” J. Clim. 22 (8): 2199–2215. https://doi.org/10.1175/2008JCLI2527.1.

Information & Authors

Information

Published In

Go to Journal of Hydrologic Engineering
Journal of Hydrologic Engineering
Volume 28Issue 4April 2023

History

Received: Jul 2, 2022
Accepted: Dec 14, 2022
Published online: Feb 9, 2023
Published in print: Apr 1, 2023
Discussion open until: Jul 9, 2023

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Ranran He, Ph.D. [email protected]
School of Management Science and Engineering, Anhui Univ. of Finance & Economics, Bengbu 233030, Anhui, China. Email: [email protected]
Yuanfang Chen [email protected]
Professor, College of Hydrology and Water Resources, Hohai Univ., Nanjing 210098, China (corresponding author). Email: [email protected]
Professor, State Key Lab of Hydrology-Water Resources and Hydraulic Engineering, Nanjing Hydraulic Research Institute, Nanjing 210029, China. Email: [email protected]
Zhengwei Pan [email protected]
Associate Professor, College of Civil Engineering, Bengbu Univ., Bengbu 233030, Anhui, China. Email: [email protected]
Qin Huang, Ph.D. [email protected]
College of Hydrology and Water Resources, Hohai Univ., Nanjing 210098, China. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share