Free access
EDITORIAL
Mar 1, 2008

Bringing Water Data Together

Publication: Journal of Water Resources Planning and Management
Volume 134, Issue 2
We are living through an information revolution. We are instantly connected to information and services through the internet, and communicate through e-mail with colleagues spread all over the globe, without regard to distance or their computer system. How can the power of this information revolution be brought to bear on water resources? This editorial presents some perspectives I have derived from my service as leader of the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) Hydrologic Information System (HIS) project, a Nation Science Foundation (NSF)-supported effort to improve hydrologic science through better data access and organization at the nation’s universities.
The circulation of the waters of the earth through the hydrologic cycle is a very complex phenomenon. Many agencies and individuals collect water observations data on streamflow, water quality, groundwater levels, precipitation, snow conditions, and climate. The largest repository of such data in the United States is the United States Geological Survey (USGS) National Water Information System (NWIS), which contains information from about 1.6 million sites for streamflow, water quality, and groundwater. Other large federal repositories of water observations data are the Environmental Protection Agency (EPA) Storet (Storage and Retrieval) for water quality, the National Climatic Data Center’s Climate Data Online, and the United States Department of Agriculture (USDA)-National Resources Conservation Service (NRCS) Snotel and Soil Climate Analysis Network (SCAN) databases. Agencies such as the Bureau of Reclamation and the Corps of Engineers store water operations data at reservoirs and other facilities. This panoply of federal data is augmented by similar data collected by state and local water agencies. Academic and research investigators collect data for individual research projects, and contribute significantly to subjects, such as aquatic biology and continuous monitoring of water quality, that are not the traditional focus of the large water agencies.
The aggregation of all this water observations information defines the historical record of the water conditions of the nation. The task of assembling all the relevant information for a particular purpose in a chosen geographic region is greatly facilitated by the publication of much of the data on Web sites. However, every organization follows its own method in constructing its Web site and formatting the data presented there, so it still requires a great deal of effort to discover, learn, and operate all these various Web data sources. And, much of the data collected by academic and research investigators is no longer accessible once research studies are completed. Water conditions have been measured in the nation at several million locations, but no comprehensive window on water data exists.
The growth of information sharing across the internet has led to the definition of standards for communication among computers. One of these standards is the HyperText Markup Language (HTML) used for communication of information visible in Web browsers, the standard form of which was defined by the World Wide Web consortium. This consortium has also defined standards for “Web services” for more specialized machine-to-machine communications through the internet, using the Simple Object Access Protocol (SOAP). A service-oriented architecture is a design pattern based on loosely coupled, self-contained services that relate producers and consumers of information, and in which, an information source will respond to a standardized request by producing an output in eXtensible Markup Language (XML). Inputs and outputs of the SOAP service are described in WSDL (Web Service Description Language) documents. The WSDL descriptions are published along with the service and contain enough information for client software to call the service in a syntactically correct way, and then parse the service output.
The CUAHSI HIS is a geographically distributed network of data sources and functions that are integrated using services-oriented architecture, so that they operate as a connected whole. As part of CUAHSI HIS, Ilya Zaslavsky and David Valentine at the San Diego Supercomputer Center have defined an XML language to communicate water observations data called WaterML, and a set of Web service methods called WaterOneFlow. The services make requests for information about water observations data such as precipitation, streamflow, water quality, and groundwater levels recorded at point locations. The Web service methods include GetSites to find the sites where data has been measured, GetVariables to identify the kinds of data that have been measured there, and GetValues to obtain the measured data. By using a combination of these services, a Site Catalog can be compiled for an observation network that documents for all sites the variables measured there, and the number of values available over a given interval of time. Either using these Web services, or by getting a direct dump of the equivalent metadata from a water data agency, CUAHSI HIS is able to compile a Site Catalog in a standardized form for each observation network, and thereby, accumulate these into a master catalog for a region or for the nation as a whole, which inventories the information in all the indexed observation networks.
Initially, CUAHSI HIS built these services as “Web scrapers” that operate over agency Web pages to mimic the action of a human user going to a Web site and manually extracting the data. The advantage is that the service works immediately and does not disturb or change the way the agency is accustomed to presenting its data. But on the other hand, the service can be slow, and if the agency changes its Web page format, the service breaks. Recently, CUAHSI concluded a memorandum of understanding with the USGS, under which the USGS has made available a GetValues Web service method that operates directly from its NWIS Daily Values database to produce daily streamflow data in WaterML. CUAHSI is also negotiating with other federal water data providers to arrive at similar agreements for Web services access to their data archives.
WaterML translates the various formats in which water observations data is presented on the internet into a single format, irrespective of the data source. It thus accomplishes “syntactic mediation,” or homogenization of the various data formats into a common format. Providing a common window on water data requires, however, a further step of “semantic mediation,” or homogenization of the variables describing the data. For example, an academic investigator may call dissolved oxygen DOCon, while the USGS describes dissolved oxygen with a set of parameter codes such as 00300, which is “Dissolved oxygen, water, unfiltered, milligrams per liter,” and the EPA Storet system uses code 89857 as “Dissolved oxygen, 24-h average (mg/L).” If a water resources specialist wanted to get all the dissolved oxygen data for a particular stream from academic and agency sources, he or she would need to know all these codes and their various nuances, which is clearly a complex task suited for experts in data interpretation.
Semantic mediation is accomplished in CUAHSI HIS by taking each observed variable in the master site catalog for the nation and linking it to a concept in a hierarchical tree of hydrologic concepts, or hydrologic ontology. Thus, scientific queries for data can be made such as, “find all the nutrient data in the Guadalupe Basin.” They are broken down into component queries (for nitrogen and phosophorus) and their various subcomponents (nitrite, nitrate, organic nitrogen, . . . , orthophosphate), and then translated into corresponding queries in WaterML to automatically acquire data from the various data sources in that basin. This water data discovery and acquisition system, called HydroSeek, is available at http://www.hydroseek.org, and is primarily the work of Michael Piasecki, Bora Beran, and colleagues at Drexel University.
All these capabilities require a formally-defined database with water observations data that water data agencies maintain as part of their operation. But academic and research investigators do not normally have such databases. More typically, academic and research investigators have an accumulation of American Standard Code for Information Exchange (ASCII), binary, Excel, or other files, sometimes indexed using a file naming convention that defines the time and place of measurement. David Tarboton and Jeff Horsburgh at Utah State University have defined the CUAHSI Observations Data Model (ODM), which is a relational database structure comprising a set of standard data tables linked by associations or relationships between key fields in pairs of tables, that are collectively sufficient to store small or large sets of observed data. The ODM provides a systematic way to organize and manage heterogeneous data from multiple sensors and sources. They have also built a set of ODM Tools to view and edit the observational data, a data loader to ingest information into the ODM from water sensors, and an editing capability for the controlled vocabularies of key terms used in the ODM, so that ODM users can submit any new terms they find necessary in their studies, for consideration as part of CUAHSI’s standard vocabulary. The CUAHSI ODM has been used to store many sets of observational water data including physical, chemical, and biological variables.
The NSF’s Geoscience and Engineering Directorates are attempting to develop a WATERS Network observatory, which, if built, will be a network of water observation systems at various locations across the nation. Eleven WATERS Network testbed site projects have been initiated at various universities around the nation, and the CUAHSI HIS containing both the observations data model and the Web services architecture for publishing data is operating at all the testbed sites. Observations data need to be interpreted in a geospatial context, so the CUAHSI HIS includes a Geographic Information System (GIS) component that stores the national stream and catchment dataset produced by EPA and USGS called NHDPlus. Other partners have created new applications of CUAHSI Web services, such as Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia, who have produced a Web application that takes the WSDL address of a data source, automatically starts up Google Earth, and zooms right into the measurement sites, so that their surroundings can be visualized.
In summary, the CUAHSI HIS has succeeded in creating a common window on water information for the United States. Many challenges remain, before the vision of a common window on water information for the nation is fully realized. Many more datasets remain to be indexed in CUAHSI HIS. More academic and research datasets need to be stored in the observations data model. Capability for working with additional data types needs to be developed. Hydrologic Information Servers need to be deployed at a larger network of institutions. Models and modeling systems need to be developed and adapted to take advantage of this data to provide better real-time initialization, forecasting capability, and integrated information on the state of the water system at any point in time and space. The CUAHSI HIS team is continuing to work on these challenges. However, the testing done using the WATERS Network testbed sites and the existing integration with some agency datasets has shown that the methodology defined for the CUAHSI HIS is sound and can serve as a foundation for building a more extensive water information system for the future.

Acknowledgments

The writer wishes to acknowledge the insights of his CUAHSI HIS colleagues Ilya Zaslavsky, David Tarboton, Michael Piasecki, Jon Goodall, Tim Whiteaker, and of the many other contributors to the CUAHSI HIS project.

Information & Authors

Information

Published In

Go to Journal of Water Resources Planning and Management
Journal of Water Resources Planning and Management
Volume 134Issue 2March 2008
Pages: 95 - 96

History

Published online: Mar 1, 2008
Published in print: Mar 2008

Permissions

Request permissions for this article.

Authors

Affiliations

David R. Maidment
Center for Research in Water Resources, Univ. of Texas, Austin, TX 78712. E-mail: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share