Introduction
Validation of urban flood prediction models requires accurate observations of flood extents and depths. Different methods are used to validate model predictions depending on the type of observed flood data. Satellite imagery and aerial photos can be used to determine flood extent at certain times during a flood event. Some satellite imagery exists for urban areas, but infrequent revisit times and locations limit their utility (
Neal et al. 2009;
Werner et al. 2005). Social-media data and news reports/photos are growing sources of data for validation and have the potential to provide vast volumes of flood-related information. However, the information in the photos and reports must be converted into useable forms, such as relating the pictured flood level to a local depth at a specific location and time of occurrence. This often requires visits to the pictured site and painstaking photo interpretation and data entry procedures (e.g.,
Macchione et al. 2019). Nonetheless, photographic high-water mark (HWM) data provide value as shown by Noh et al. (
2019), Yu et al. (
2016), Xing et al. (
2019), Blumberg et al. (
2015), Fohringer et al. (
2015), Kutija et al. (
2014), and McDougall and Temple-Watts (
2012).
Debris lines left on the ground at the flooded edge [i.e., wrack lines; Neal et al. (
2009)] provide an estimate of maximum flood extent and also give an indication of maximum surface water elevation. High-water marks (HWMs) such as mud lines on trees or buildings can be surveyed to provide point estimates of maximum surface water elevation. For evaluating predicted flood extents, binary pixel-wise metrics, such as the critical success index derived from contingency tables, have been used to quantify the error between predicted and observed wet/dry computational cells (e.g.,
Wing et al. 2019;
Yu and Lane 2006). However, Stephens et al. (
2014) noted biases in these metrics and recommended further exploration. Moreover, consensus does not exist in the literature on the best approach to evaluate simulated and observed HWMs. When both the simulated and observed high water indicates an above-ground depth at a specific location, researchers apply traditional measures, such as mean error, root mean square error (RMSE), correlation, and bias, to quantify the vertical error (e.g.,
Wing et al. 2019;
Xing et al. 2019;
Yu et al. 2016;
Hartnet and Nash 2017;
Nguyen et al. 2016;
Blumberg et al. 2015;
Horritt et al. 2010;
Neal et al. 2009;
Mignot et al. 2006). However, in modeling studies, it is possible that the predicted flood extent does not reach the location of an observed high-water mark. In such instances, it is not clear how to compute a goodness-of-fit metric. To the best of our knowledge, only a few studies addressed this issue. Savage et al. (
2016), Smith et al. (
2015), and Neal et al. (
2009) computed the vertical difference between the high-water mark elevation and the water surface elevation in the nearest wet cell. In another approach, Hunter et al. (
2005) computed the vertical difference between the high-water mark elevation and the digital elevation model (DEM) elevation.
Additional complexities emerge when using point-surveyed HWMs to evaluate models having different discretizations and computational element sizes. Fig.
1 illustrates these issues using the two models in the present study: the Weather Research Forecasting Model Hydrologic Extension (WRF-Hydro) (
Gochis et al. 2020) and ADHydro (
Ogden et al. 2015). ADHydro is an unstructured mesh model that uses smaller elements where more topographic detail is needed, such as channel-overbank boundaries, while larger elements are used to represent areas requiring less detail. In contrast, WRF-Hydro was applied at a
grid resolution throughout the study domain. In Fig.
1, the underlying DEM is the 10-m USGS National Elevation Dataset (NED). The notations SUG_16, 17, and 18 indicate the location of three surveyed HWMs along Sugar Creek. The thick blue line represents the stream channel vector. The red triangles are the ADHydro mesh elements encompassing the HWMs. Fig.
1 clearly shows the size variation among ADHydro mesh elements and the typical size difference between the ADHydro mesh elements and the WRF-Hydro 10-m grid cells. Because both models utilize the same underlying DEM, differences in elevations between the two models should be small and a function of the underlying grid structure and element size differences. That being said, elevation differences between the models are a source of uncertainty and could introduce error, especially when larger ADHydro mesh elements span areas of high elevation variability in the DEM.
These uncertainties give rise to such questions as the following:
•
How does one evaluate predicted water depths originating from models having different underlying grid mesh structures and element sizes?
•
How does one assess model performance when a predicted neighborhood water depth magnitude is approximately equal to the surveyed HWM depth but is spatially shifted?
•
Likewise, how does one evaluate a modeled water depth that matches the extent of the HWM but not the magnitude?
To address these questions, this paper presents novel methods for evaluating model predictions of flood depths at surveyed high-water marks. These techniques account for differences in model element discretization and size when comparing simulated flood depths to surveyed HWMs. We also developed a novel approach to qualitatively analyze inundation predictions at the locations of flood-damaged structures and crowd-sourced observations of flooded locations. The work in this study is part of a more complete evaluation of two hyper-resolution models (HRMs) for predicting urban flooding (
Smith et al. 2020). To the best of our knowledge, our evaluation is among the most comprehensive whole-city studies to date, considering the number of storm events and corresponding observations of surveyed HWMs, flood damage locations, and crowd-sourced locations of flooding.
Models
We used two models in our study. ADHydro (
Ogden et al. 2015) was developed at the University of Wyoming to simulate large watershed response to climate change. ADHydro has been parallelized to run in a high-performance computing (HPC) environment and uses an unstructured mesh discretization to describe land surface and subsurface characteristics. The model partitions precipitation into runoff using the Green & Ampt redistribution method coupled to shallow groundwater using a one-dimensional (1D) finite-moisture content discretization of the advection-like term of the soil moisture velocity equation (
Lai et al. 2015;
Ogden et al. 2017). Two-dimensional overland flow is calculated using either the full dynamic wave or diffusive wave (zero-inertia) approximation of the de Saint-Venant equations. The full dynamic or diffusion wave approximations are also used to solve the one-dimensional de Saint-Venant equations for channel flow. Two-way coupling of the overland and channel flow is based on a source-term lateral flow connection using a broad-crested weir equation.
The second model was WRF-Hydro (
Gochis et al. 2020), developed at the National Center for Atmospheric Research (NCAR). A version of WRF-Hydro forms the core of the National Weather Service (NWS) National Water Model (NWM). WRF-Hydro has also been parallelized to run in an HPC environment. The Noah-MP Multi-Parameterization (Noah-MP) model (
Niu et al. 2011;
Yang et al. 2011) is used to compute water balance and runoff generation. Two combinations of overland and channel routing were available. One version has diffusive wave overland flow with two-way coupling to an approximation of diffusive wave channel routing. In the second and selected version, diffusive wave routing is used for both overland and channel flow. However, this version is limited to a one-way coupling between overland and channel flow. All channel flow is retained within trapezoidal elements and cannot overflow onto the floodplain. In this project, we set up WRF-Hydro to run on a
structured grid. Kim et al. (
2021) applied a similar version of WRF-Hydro to study the impacts of spatial and temporal resolution, calibration, initial conditions, and streamflow data assimilation on outlet hydrographs for three small basins.
Study Basin
The study area was the
Sugar Creek watershed above the USGS Gage 02146800. This basin completely encompasses the city of Charlotte, NC (Fig.
2). The Sugar Creek basin lies almost entirely within Mecklenburg County, NC, with a small portion containing the outlet gage in Fort Mill, South Carolina. The Charlotte metropolitan area has undergone rapid urban and suburban growth since the 1960s, with urban area increasing from 31.5% in 1992 to 68.3% in 2011 (
Zhou et al. 2017). During the same period, forested area decreased from 55% in 1992 to 27.7% in 2011 [see the study by Zhou et al. (
2017) and references therein]. The basin is highly flood-prone, with warm-season thunderstorm systems and tropical cyclones causing the main flood-producing events. This region is served by the Flood Information and Notification System (
FINS 2017), a collaborative effort between the USGS and Charlotte-Mecklenburg Storm Water Services (CMSWS) to provide data collection, monitoring, and alert services to the Charlotte metropolitan area.
Model Application
We set up ADHydro and WRF-Hydro on the Sugar Creek basin to generate predictions of maximum flow depths in each computational element to compare to the observed high water information for each storm. Noting the major role that streets play in routing urban floods (e.g.,
Schubert and Sanders 2012), we created the mesh for ADHydro so that major streets were defined as impervious flow paths (e.g.,
Gallegos et al. 2009). For WRF-Hydro, the DEM corresponding to major streets was artificially lowered to ensure that flow followed street directions. After these steps, the median area of the ADHydro irregular mesh elements for the entire Charlotte basin was
or
on a regular grid side. The basin-wide ratio of ADHydro median element sizes to WRF-Hydro grid cells was
. Along the channel segments, the median area of the ADHydro mesh elements was
or approximately 40 m on a regular grid side. Thus, near the channels, the ratio of ADHydro median mesh element size to WRF-Hydro grid cell size was approximately
. Trapezoidal channel dimensions for both models were defined using empirical stream order relationships that could be applied nationally.
Surveyed HWM data are not available everywhere in the US; thus, we did not use these measurements for model calibration. Our goal was to calibrate model parameters using only nationally-available USGS observed hydrographs to get the hydrograph volume correct and, subsequently, to determine how well the models performed for simulating observed HWMs. Interested readers are referred to the study by Smith et al. (
2020) for details regarding model calibration, simulation run periods, initial conditions, and analysis of simulated hydrographs. In this study, we focus on the analysis of inundation results from versions of ADHydro and WRF-Hydro that were calibrated to fit observed hydrographs.
The constraint to use only nationally-available data sets in our underlying feasibility study (
Smith et al. 2020) precluded the explicit modeling of buildings, microtopography, storm sewer networks, and cross sections, which likely impacted the simulation accuracy. Nonetheless, the choice of which urban features to model and how to model them must be considered in light of trade-offs in computational time, expected accuracy, and model complexity. Moreover, we still do not know how much physical complexity a flood inundation model needs to address a given problem (
Neal et al. 2012). Modelers are cautioned regarding the expectation that increased modeling resolution and complexity will necessarily result in greater accuracy (
Dottori et al. 2013). Modeling choices must also consider project goals, end-user requirements, data availability, preprocessing demands, and implementation effort (
Schubert and Sanders 2012). These considerations are important to the NWS for the operational implementation of models at a national scale. For example, end users of NWS flood forecasts, such as emergency managers, often want actionable depth information presented in general ranges as they consider what level of response is necessary, such as signage, road closures, and rescue operations.
We present several examples of the trade-off between model complexity (e.g., buildings, storm sewers, and cross sections) and project goals. Horritt et al. (
2010) and Gallegos et al. (
2009) determined that excessive computational demands with two-dimensional (2D) hydraulic models precluded the use of mesh sizes needed to resolve buildings. Yu et al. (
2016) neglected buildings given the project scope and goals. Wing et al. (
2019) did not model buildings, streets, or storm sewers in their city-scale evaluation of a 2D hydraulic model and a simple GIS-based approach for Hurricane Harvey in Houston. Even when buildings are modeled, simulation results can be contradictory and confounding. For example, Neal et al. (
2009) found that RMSE errors in HWM simulations were slightly worse when buildings were modeled compared to the no-building scenario. Similarly, Grimley et al. (
2017) found that representing buildings in the terrain model resulted in slightly worse results in basin outlet hydrograph simulation compared to the no-building case. On the other hand, Schubert and Sanders (
2012) found that the inclusion of buildings is important for modeling local scale velocities and depths but less important for the simulation of hydrographs and flood extents.
Regarding the importance of defining urban microtopography, Fewtrell et al. (
2011) conducted a benchmarking study using two variants of a hydraulic model. Spatial resolutions of 25 cm, 50 cm, 1 m, 2 m, and 5 m were used to define the microtopography (e.g., curbs, road camber, etc.) on a very small
basin. Such modeling resolutions required the use of vehicle-mounted light detection and ranging (LiDAR) units as airborne LiDAR has been incapable of providing the resolution needed to define urban microtopography (
Ozdemir et al. 2013). Furthermore, proprietary software was needed to process the LiDAR data. Clearly, such efforts are nearly impossible at present and in the near future for city-scale operationally-viable forecasting in urban areas across the US.
Studies have shown (e.g.,
Rafieeinasab et al. 2015;
Schumann et al. 2011;
Ogden et al. 2011) that in severe rainfall events, such as the two used in our study, the capacity of the subsurface drainage network pales in comparison to the flow conveyed by surface features. Moreover, it is nearly impossible to model all storm sewers in a city-wide domain in the time appropriate for operational forecasting. As a result, decisions must be made as to what level of simplification of the storm sewer network needs to be made to meet project goals (e.g.,
Habibi and Seo 2018;
Leitao et al. 2010). Indeed, the immense complexity of the storm sewer network argues for simplicity as a first modeling step, as in our case (
Gallegos et al. 2009).
It is well known that cross-section shape and spacing can have large influences on the extent and depth of flood inundation. Among others, Ali et al. (
2015), Cook and Merwade (
2009), and Fewtrell et al. (
2011) noted differences in flood inundation extents and depths when using cross sections derived from topographic data of various resolutions.
Discussion
Analysis of predicted maximum depths at surveyed HWMs showed that WRF-Hydro achieved smaller RMSE and MAE values than ADHydro. Analysis of areal sector depths supported this result. As stated previously, this result is likely due to the fact that WRF-Hydro uses one-way coupling between overland and channel flow in contrast to the two-way coupling used in ADHydro. Our results suggest that WRF-Hydro predicts shallower depths than ADHydro at or in the vicinity of the surveyed HWMs, which may lead to smaller RMSE and MAE errors when observed flood depths are shallow. We investigated whether differences in the centroid distance of the element to the stream channel, differences in model element area size, or other topology-related characteristics could help explain the RMSE and MAE results for surveyed HWMs (not shown). We were unable to identify a clear signal that would highlight one model’s topology-related advantage over the other, and it is important to note that both models derived their topographic representations from the same 10-m NED DEM. As stated previously, analysis of the NED grid elevations for the areal sectors typically revealed a uniform elevation profile within most areal sectors (
Patrick et al. 2018; Appendix
II). This does not imply that the 10-m NED DEM shared the same elevation as the ground elevation at the observed surveyed HWM but that both models used the same NED elevation data, and therefore, the relative predicted water depths of the two models should be comparable.
We place our results in light of other whole-city investigations that validated models against numerous surveyed HWMs, recognizing that differences in study contexts preclude a strict comparison of results. Wing et al. (
2019) reported RMSE and MAE errors of 1.71 m and 1.03 m, respectively, using 1,123 surveyed HWMs for flooding in Houston caused by Hurricane Harvey. Xing et al. (
2019) simulated inundation depths at 368 surveyed HWMs and achieved an RMSE of 0.36 m. Neal et al. (
2009) reported RMSE errors of 0.32 and 0.28 m for 263 HWM simulations with and without building representations, respectively. Our reference HWM RMSE values fit within but near the high end of the range of these reported errors. It is highly likely that our results were affected by the use of empirically-derived channel properties rather than surveyed cross-section data. Neal et al. (
2009) used numerous channel cross sections, which may have contributed to their low RMSE values. Neither Xing et al. (
2019) nor Wing et al. (
2019) mention the use of channel cross-section information.
Conclusions and Recommendations
This paper presents the application of novel techniques for the analysis of simulations of high water observations. We compared predicted maximum depths at 172 surveyed high-water marks, 373 locations of flooded structures, and nearly 2,000 observed flooded locations to evaluate the models’ ability to simulate inundation. In terms of data abundance, our study is among the most comprehensive reported in the literature to date.
Simulation results were somewhat mixed between models, highlighting the need to examine multiple metrics when evaluating models. WRF-Hydro achieved lower values of RMSE and MAE when comparing simulated and surveyed HWM depths, but we surmise that this is attributed to shallower computed water depths when observed depths are also shallow. The marked improvement for ADHydro values of RMSE and MAE when removing on-ground HWMs suggests that WRF-Hydro skews this result in cases of shallow water depths. On the other hand, ADHydro more frequently generated significant inundation when compared to WRF-Hydro for all depth thresholds at surveyed high-water marks. In addition, ADHydro more often predicted flood inundation at locations with observed flood damage and/or street inundation. Thus, we conclude ADHydro properly predicted inundation more often than WRF-Hydro.
Evaluation of simulated inundation depths and extents is complex. Our spatial analyses attempted to account for differences in model discretizations and computational element sizes and to distinguish model performance, assuming that the model predicted flooding in the vicinity of the HWMs. The techniques were predicated on an analysis of NED 10-m grid elevations, which showed minimal topographic variation in most of the areal sectors. Given the data constraints, modeling assumptions, and purpose of the study, we believe the analysis techniques helped distinguish model performance differences and identify model deficiencies. The analysis methods in our study are broadly applicable for validating and intercomparing urban flood inundation models.
Further work is recommended to diagnose the surveyed HWM results. We used highly accurate surveyed HWMs in conjunction with the 10-m NED DEM. Future work could use the surveyed HWM data in conjunction with the 1-m LiDAR DEM available for Charlotte, NC, in the hope of achieving more accurate results (e.g.,
Neal et al. 2009). Two LiDAR DEM versions are available:
and
(Josh McSwain, CMSWS, personal communication, August 27, 2020).
Both models defined channel geometry using stream-order scaling relationships. Using available surveyed cross-section information would likely have benefited both models. Surveyed cross sections were available for the Sugar Creek basin but not used as we desired to explore model performance using only data sets having national coverage.
Future related studies should be limited to those models having a two-way coupling of overbank and channel flow. ADHydro contained a two-way coupling between the overland and channel routing components. Continued development of WRF-Hydro should include a similar two-way linkage between overland routing and explicit channel routing to allow excess channel flow to move onto overbank areas.