Technical Papers
May 13, 2024

Fill-and-Spill: Deep Reinforcement Learning Policy Gradient Methods for Reservoir Operation Decision and Control

Publication: Journal of Water Resources Planning and Management
Volume 150, Issue 7

Abstract

Changes in demand, various hydrological inputs, and environmental stressors are among the issues that reservoir managers and policymakers face on a regular basis. These concerns have sparked interest in applying different techniques to determine reservoir operation policy decisions. As the resolution of the analysis increases, it becomes more difficult to effectively represent a real-world system using traditional methods such as dynamic programming and stochastic dynamic programming for determining the best reservoir operation policy. One of the challenges is the “curse of dimensionality,” which means the number of samples needed to estimate an arbitrary function with a given level of accuracy grows exponentially with respect to the number of input variables (i.e., dimensionality) of the function. Deep reinforcement learning (DRL) is an intelligent approach to overcome the curses of stochastic optimization problems for reservoir operation policy decisions. To our knowledge, this study is the first attempt that examines various novel DRL continuous-action policy gradient methods, including deep deterministic policy gradients, twin delayed DDPG (TD3), and two different versions of Soft Actor-Critic (SAC18 and SAC19) for optimizing reservoir operation policy. In this study, multiple DRL techniques were implemented to find an optimal operation policy for Folsom Reservoir in California. The reservoir system supplies agricultural, municipal, hydropower, and environmental flow demands and flood control operations to the City of Sacramento. Analysis suggests that the TD3 and SAC are robust to meet the Folsom Reservoir’s demands and optimize reservoir operation policies. Experiments on continuous-action spaces of reservoir policy decisions demonstrated that the DRL techniques can efficiently learn strategic policies in spaces and can overcome the curse of dimensionality and modeling.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

All data, models, or codes that support the findings of this study are available from the corresponding author upon request.

Acknowledgments

This research is supported by the US Geological Survey (Grant No. # 5001-20-207-0312-216-2024917). Clemson University is acknowledged for its generous allotment of computing time on the Palmetto cluster. The authors would like to thank M. Giuliani and A. Castelletti of Politecnico di Milano, Milano, Italy for their constructive comments on the methodology and synthetic streamflow generator approach.

References

Achiam, J. 2018. “Spinning up in deep reinforcement learning.” Accessed January 15, 2020. https://spinningupopenaicom.
Asefa, T., J. Clayton, A. Adams, and D. Anderson. 2014. “Performance evaluation of a water resources system under varying climatic conditions: Reliability, resilience, vulnerability and beyond.” J. Hydrol. 508 (Jan): 53–65. https://doi.org/10.1016/j.jhydrol.2013.10.043.
Belfadil, A., D. Modesto, J. Meseguer, B. Joseph-Duran, D. Saporta, and J. A. Martin Hernandez. 2024. “Leveraging deep reinforcement learning for water distribution systems with large action spaces and uncertainties: DRL-EPANET for pressure control.” J. Water Resour. Plann. Manage. 150 (2): 04023076. https://doi.org/10.1061/JWRMD5.WRENG-6108.
Bellman, R. 1957. Dynamic programming. Princeton, NJ: Princeton University Press.
Bellman, R. E., and S. E. Dreyfus. 2015. Applied dynamic programming. Princeton, NJ: Princeton University Press.
Bertoni, F., M. Giuliani, and A. Castelletti. 2020. “Integrated design of dam size and operations via reinforcement learning.” J. Water Resour. Plann. Manage. 146 (4): 1–12. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001182.
Bertsekas, D. P., and J. N. Tsitsiklis. 1995. “Neuro-dynamic programming: An overview.” In Proc., 1995 34th IEEE Conf. on Decision and Control, 560–564. New York: IEEE.
Bhattacharya, B., A. H. Lobbrecht, and D. P. Solomatine. 2003. “Neural networks and reinforcement learning in control of water systems.” J. Water Resour. Plann. Manage. 129 (6): 458–465. https://doi.org/10.1061/(ASCE)0733-9496(2003)129:6(458).
Bouchart, F. J. C., and H. Chkam. 1998. “A reinforcement learning model for the operation of conjunctive use schemes.” WIT Trans. Ecol. Environ. 26 (Aug): 27. https://doi.org/10.2495/HY980301.
Brockman, G., V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. 2016. “OpenAI gym.” Preprint, submitted June 5, 2016. https://arxiv.org/abs/1606.01540.
Carpenter, T. M., and K. P. Georgakakos. 2001. “Assessment of Folsom lake response to historical and potential future climate scenarios: 1. Forecasting.” J. Hydrol. 249 (1–4): 148–175. https://doi.org/10.1016/S0022-1694(01)00417-6.
Castelletti, A., G. Corani, A. Rizzoli, R. Soncini-Sessa, and E. Weber. 2002. “Reinforcement learning in the operational management of a water system.” In Proc., IFAC Workshop on Modeling and Control in Environmental Issues, 325–330. Tokyo: Keio Univ.
Castelletti, A., G. Corani, A. E. Rizzoli, R. Soncini-Sessa, and E. Weber. 2001. “A reinforcement learning approach for the operational management of a water system.” In Proc., IFAC Workshop Modelling and Control in Environmental Issues, 22–23. Amsterdam, Netherlands: Elsevier.
Castelletti, A., S. Galelli, M. Restelli, and R. Soncini-Sessa. 2010. “Tree-based reinforcement learning for optimal water reservoir operation.” Water Resour. Res. 46 (9): 1–19. https://doi.org/10.1029/2009WR008898.
Castelletti, A., F. Pianosi, and M. Restelli. 2013. “A multiobjective reinforcement learning approach to water resources systems operation: Pareto frontier approximation in a single run.” Water Resour. Res. 49 (6): 3476–3486. https://doi.org/10.1002/wrcr.20295.
Castelvecchi, D. 2016. “Can we open the black box of AI?” Nat. News 538 (7623): 20.
Clavera, I., J. Rothfuss, J. Schulman, Y. Fujita, T. Asfour, and P. Abbeel. 2018. “Model-based reinforcement learning via meta-policy optimization.” In Proc., Conf. on Robot Learning, 617–629. Baltimore: Proceedings of Machine Learning Research.
Delipetrev, B., A. Jonoski, and D. P. Solomatine. 2017. “A novel nested stochastic dynamic programming (NSDP) and nested reinforcement learning (NRL) algorithm for multipurpose reservoir optimization.” J. Hydroinf. 19 (1): 47–61. https://doi.org/10.2166/hydro.2016.243.
Depeweg, S., J. M. Hernández-Lobato, F. Doshi-Velez, and S. Udluft. 2017. “Decomposition of uncertainty for active learning and reliable reinforcement learning in stochastic systems.” Stat 1050 (1): 11. https://doi.org/10.48550/arXiv.1710.07283.
Ernst, D., P. Geurts, and L. Wehenkel. 2005. “Tree-based batch mode reinforcement learning.” J. Mach. Learn. Res. 6 (Jun): 503–556.
Finn, C., P. Abbeel, and S. Levine. 2017. “Model-agnostic meta-learning for fast adaptation of deep networks.” In Proc., Int. Conf. on Machine Learning, 1126–1135. Baltimore: Proceedings of Machine Learning Research.
Fletcher, S., A. Hadjimichael, J. Quinn, K. Osman, M. Giuliani, D. Gold, A. J. Figueroa, and B. Gordon. 2022. “Equity in water resources planning: A path forward for decision support modelers.” J. Water Resour. Plann. Manage. 148 (7): 02522005. https://doi.org/10.1061/(ASCE)WR.1943-5452.000157.
Fujimoto, S., H. Hoof, and D. Meger. 2018. “Addressing function approximation error in actor-critic methods.” In Proc., Int. Conf. on Machine Learning, 1587–1596. Baltimore: Proceedings of Machine Learning Research.
Giuliani, M., J. R. Lamontagne, P. M. Reed, and A. Castelletti. 2021. “A state-of-the-art review of optimal reservoir control for managing conflicting demands in a changing world.” Water Resour. Res. 57 (12): e2021WR029927. https://doi.org/10.1029/2021WR029927.
Grondman, I., L. Busoniu, G. A. Lopes, and R. Babuska. 2012. “A survey of actor-critic reinforcement learning: Standard and natural policy gradients.” IEEE Trans. Syst. Man Cyber. Part C 42 (6): 1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595.
Haarnoja, T., A. Zhou, P. Abbeel, and S. Levine. 2018a. “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor.” In Proc., Int. Conf. on Machine Learning, 1861–1870. Baltimore: Proceedings of Machine Learning Research.
Haarnoja, T., A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, and P. Abbeel. 2018b. “Soft actor-critic algorithms and applications.” Preprint, submitted December 13, 2018. http://arxiv.org/abs/1812.05905.
Hamilton, A. L., H. B. Zeff, G. W. Characklis, and P. M. Reed. 2022. “Resilient California water portfolios require infrastructure investment partnerships that are viable for all partners.” Earth’s Future 10 (4): e2021EF002573. https://doi.org/10.1029/2021EF002573.
Hashimoto, T., J. R. Stedinger, and D. P. Loucks. 1982. “Reliability, resiliency, and vulnerability criteria for water resource system performance evaluation.” Water Resour. Res. 18 (1): 14–20. https://doi.org/10.1029/WR018i001p00014.
Hejazi, M. I., X. Cai, and B. L. Ruddell. 2008. “The role of hydrologic information in reservoir operation–Learning from historical releases.” Adv. Water Resour. 31 (12): 1636–1650. https://doi.org/10.1016/j.advwatres.2008.07.013.
Herman, J. D., and M. Giuliani. 2018. “Policy tree optimization for threshold-based water resources management over multiple timescales.” Environ. Model. Software 99 (Jan): 39–51. https://doi.org/10.1016/j.envsoft.2017.09.016.
Herman, J. D., H. B. Zeff, J. R. Lamontagne, P. M. Reed, and G. W. Characklis. 2016. “Synthetic drought scenario generation to support bottom-up water supply vulnerability assessments.” J. Water Resour. Plann. Manage. 142 (11): 4016050. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000701.
Huang, K., A. Hussain, Q. F. Wang, and R. Zhang. 2019. Deep learning: Fundamentals, theory and applications. Berlin: Springer.
Jacobson, D. H., and D. Q. Mayne. 1970. Differential dynamic programming. Amsterdam, Netherlands: Elsevier.
Kingma, D. P., and J. Ba. 2014. “Adam: A method for stochastic optimization.” Preprint, submitted December 22, 2014. https://arxiv.org/abs/1412.6980.
Kirsch, B. R., G. W. Characklis, and H. B. Zeff. 2013. “Evaluating the impact of alternative hydro-climate scenarios on transfer agreements: Practical improvement for generating synthetic streamflows.” J. Water Resour. Plann. Manage. 139 (4): 396–406. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000287.
Kjeldsen, T. R., and D. Rosbjerg. 2004. “Choice of reliability, resilience and vulnerability estimators for risk assessments of water resources systems (Choix d’estimateurs de fiabilité, de résilience et de vulnérabilité pour les analyses de risque de systèmes de ressources en eau).” Hydrol. Sci. J. 49 (Oct): 5. https://doi.org/10.1623/hysj.49.5.755.55136.
Lane, B. A., S. Sandoval-Solis, and E. C. Porse. 2015. “Environmental flows in a human-dominated system: Integrated water management strategies for the Rio Grande/Bravo Basin.” River Res. Appl. 31 (9): 1053–1065. https://doi.org/10.1002/rra.2804.
Larson, R. E. 1968. State increment dynamic programming. Amsterdam, Netherlands: Elsevier.
Lee, J. H., and J. W. Labadie. 2007. “Stochastic optimization of multireservoir systems via reinforcement learning.” Water Resour. Res. 43 (11): 1–16. https://doi.org/10.1029/2006WR005627.
Lillicrap, T. P., J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. 2015. “Continuous control with deep reinforcement learning.” Preprint, submitted September 9, 2015. http://arxiv.org/abs/1509.02971.
Lotfi, F., O. Semiari, and W. Saad. 2022. “Semantic-aware collaborative deep reinforcement learning over wireless cellular networks.” In Proc., ICC 2022-IEEE Int. Conf. on Communications, 5256–5261. New York: IEEE.
Loucks, D. P. 1997. “Quantifying trends in system sustainability.” Hydrol. Sci. J. 42 (4): 513–530. https://doi.org/10.1080/02626669709492051.
Luenberger, D. G. 1971. “Cyclic dynamic programming: A procedure for problems with fixed delay.” Oper. Res. 19 (4): 1101–1110. https://doi.org/10.1287/opre.19.4.1101.
Mahootchi, M., K. Ponnambalam, and H. R. Tizhoosh. 2010. “Comparison of risk-based optimization models for reservoir management.” Can. J. Civ. Eng. 37 (1): 112–124. https://doi.org/10.1139/L09-165.
Mahootchi, M., H. R. Tizhoosh, and K. Ponnambalam. 2007a. “Opposition-based reinforcement learning the management of water resources.” In Proc., 2007 IEEE Int. Symp. on Approximate Dynamic Programming and Reinforcement Learning, 217–224. New York: IEEE.
Mahootchi, M., H. R. Tizhoosh, and K. Ponnambalam. 2007b. “Reservoir operation optimization by reinforcement learning.” J. Water Manage. Model. 6062 (Feb): 15. https://doi.org/10.14796/jwmm.r227-08.
Mariano-Romero, C. E., V. H. Alcocer-Yamanaka, and E. F. Morales. 2007. “Multi-objective optimization of water-using systems.” Eur. J. Oper. Res. 181 (3): 1691–1707. https://doi.org/10.1016/j.ejor.2006.08.007.
Mnih, V., et al. 2015. “Human-level control through deep reinforcement learning.” Nature 518 (7540): 529–533. https://doi.org/10.1038/nature14236.
Moy, W., J. L. Cohon, and C. S. ReVelle. 1986. “A programming model for analysis of the reliability, resilience, and vulnerability of a water supply reservoir.” Water Resour. Res. 22 (4): 489–498. https://doi.org/10.1029/WR022i004p00489.
Mullapudi, A., M. J. Lewis, C. L. Gruden, and B. Kerkez. 2020. “Deep reinforcement learning for the real time control of stormwater systems.” Adv. Water Resour. 140 (Jun): 103600. https://doi.org/10.1016/j.advwatres.2020.103600.
Peacock, M. E. 2020. “A value-function based method for incorporating ensemble forecasts in real-time optimal reservoir operations.” Doctoral dissertation, Dept. of Civil and Environmental Engineering, Colorado State Univ.
Pereira, M. V. F., and L. M. V. G. Pinto. 1991. “Multi-stage stochastic optimization applied to energy planning.” Math. Program. 52 (May): 359–375. https://doi.org/10.1007/BF01582895.
Rakelly, K., A. Zhou, C. Finn, S. Levine, and D. Quillen. 2019. “Efficient off-policy meta-reinforcement learning via probabilistic context variables.” In Proc., Int. Conf. on Machine Learning, 5331–5340. Baltimore: Proceedings of Machine Learning Research.
Reed, P. M., D. Hadka, J. D. Herman, J. R. Kasprzyk, and J. B. Kollat. 2013. “Evolutionary multiobjective optimization in water resources: The past, present, and future.” Adv. Water Resour. 51 (Jan): 438–456. https://doi.org/10.1016/j.advwatres.2012.01.005.
Reed, P. M., and J. Kasprzyk. 2009. “Water resources management: The myth, the wicked, and the future.” J. Water Resour. Plann. Manage. 135 (6): 411–413. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000047.
Rieker, J. D., and J. W. Labadie. 2012. “An intelligent agent for optimal river-reservoir system management.” Water Resour. Res. 48 (9): 1–16. https://doi.org/10.1029/2012WR011958.
Rummery, G. A., and M. Niranjan. 1994. On-line Q-learning using connectionist systems. Princeton, NJ: Citeseer.
Sandoval-Solis, S., D. C. McKinney, and D. P. Loucks. 2011. “Sustainability index for water resources planning and management.” J. Water Resour. Plann. Manage. 137 (5): 381–390. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000134.
Schulman, J., S. Levine, P. Abbeel, M. Jordan, and P. Moritz. 2015a. “Trust region policy optimization.” In Proc., Int. Conf. on Machine Learning, 1889–1897. Baltimore: Proceedings of Machine Learning Research.
Schulman, J., P. Moritz, S. Levine, M. Jordan, and P. Abbeel. 2015b. “High-dimensional continuous control using generalized advantage estimation.” Preprint, submitted June 8, 2015. https://arxiv.org/abs/1506.02438.
Shih, J.-S., and C. ReVelle. 1994. “Water-supply operations during drought: Continuous hedging rule.” J. Water Resour. Plann. Manage. 120 (5): 613–629. https://doi.org/10.1061/(ASCE)0733-9496(1994)120:5(613).
Shih, J.-S., and C. ReVelle. 1995. “Water supply operations during drought: A discrete hedging rule.” Eur. J. Oper. Res. 82 (1): 163–175. https://doi.org/10.1016/0377-2217(93)E0237-R.
Silver, D., T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, and T. Graepel. 2018. “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.” Science 362 (6419): 1140–1144. https://doi.org/10.1126/science.aar6404.
Silver, D., G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller. 2014. “Deterministic policy gradient algorithms.” In Proc., Int. Conf. on Machine Learning, 387–395. Baltimore: Proceedings of Machine Learning Research.
Suttan, R., and A. Barto. 1998. Reinforcement learning: An introduction. London: MIT Press.
Tejada-Guibert, J. A., S. A. Johnson, and J. R. Stedinger. 1995. “The value of hydrologic information in stochastic dynamic programming models of a multireservoir system.” Water Resour. Res. 31 (5): 2571–2579. https://doi.org/10.1029/95WR02172.
Turner, S. W. D., and S. Galelli. 2016. “Regime-shifting streamflow processes: Implications for water supply reservoir operations.” Water Resour. Res. 52 (5): 3984–4002. https://doi.org/10.1002/2015WR017913.
USBR (US Bureau of Reclamation). 2012. Folsom dam general information. Folsom, CA: USBR.
Van de Wiele, T., D. Warde-Farley, A. Mnih, and V. Mnih. 2020. “Q-learning in enormous action spaces via amortized approximate maximization.” Preprint, submitted January 22, 2020. https://arxiv.org/abs/2001.08116.
Vieira, E., and S. Sandoval-Solis. 2018. “Water resources sustainability index for a water-stressed basin in Brazil.” J. Hydrol. Reg. Stud. 19 (Oct): 97–109. https://doi.org/10.1016/j.ejrh.2018.08.003.
Watkins, C. J., and P. Dayan. 1992. “Q-learning.” Mach. Learn. 8 (Jun): 279–292. https://doi.org/10.1007/BF00992698.
Wilson, G. 1996. “Reinforcement learning: A new technique for the real-time optimal control of hydraulic networks.” In Proc., Hydroinformatics Conf. London: International Water Association.
Wong, P. J., and D. G. Luenberger. 1968. “Reducing the memory requirements of dynamic programming.” Oper. Res. 16 (6): 1115–1125. https://doi.org/10.1287/opre.16.6.1115.
Xu, W., F. Meng, W. Guo, X. Li, and G. Fu. 2021. “Deep reinforcement learning for optimal hydropower reservoir operation.” J. Water Resour. Plann. Manage. 147 (8): 04021045. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001409.

Information & Authors

Information

Published In

Go to Journal of Water Resources Planning and Management
Journal of Water Resources Planning and Management
Volume 150Issue 7July 2024

History

Received: Dec 17, 2022
Accepted: Feb 12, 2024
Published online: May 13, 2024
Published in print: Jul 1, 2024
Discussion open until: Oct 13, 2024

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Sadegh Sadeghi Tabas, Ph.D., S.M.ASCE
Ph.D. Student, School of Computing, Clemson Univ., Clemson, SC 29634; Glenn Dept. of Civil Engineering, Clemson Univ., Clemson, SC 29634.
Assistant Professor, Dept. of Agricultural Sciences, Clemson Univ., Clemson, SC 29634 (corresponding author). ORCID: https://orcid.org/0000-0003-1494-6481. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share