Open access
Technical Papers
Nov 21, 2018

Hybrid Wavelet Neural Network Approach for Daily Inflow Forecasting Using Tropical Rainfall Measuring Mission Data

Publication: Journal of Hydrologic Engineering
Volume 24, Issue 2

Abstract

A novel wavelet-artificial neural network hybrid model (WA-ANN) for short-term daily inflow forecasting is proposed, using for the first time Tropical Rainfall Measuring Mission (TRMM) data together with inflow data, which were transformed using mother-wavelets to improve the model performance. The models were assessed using the inflow records to a Brazilian reservoir named Três Marias, located in the São Francisco River basin, and daily rainfall estimates from the TRMM both for the period of 1998–2012. Several combinations of inputs for both regular and hybrid artificial neural networks (ANN) were assessed to forecast inflows seven days ahead, and it was proved that the WA-ANN had a superior performance. Even the WA-ANN model, which uses only the approximation at level three of rainfall data, provided a higher performance than the regular ANN, which uses the raw inflow data [r increase 16%, Nash–Sutcliffe model efficiency coefficient (NASH) increase 35%, and root-mean-square deviation (RMSD) decrease 47%]. It was also found the best model was the WA-ANN with transformed rainfall and inflow data as input (r increase 20%, NASH increase 44%, and RMSD decrease 69%).

Introduction

The economic development of any region is directly linked to the quantity and quality of its water resources. Proper management of these resources is able to minimize the effects of various natural phenomena, such as droughts that directly affect the energy sector, for example. Therefore, inflow forecasting is an important issue for operation of flood and mitigation systems, operation and planning of reservoirs, hydropower generation, and many other applications, and has been discussed and published in several studies (e.g., Bertone et al. 2015; Bennett et al. 2016). In Brazil, the energy system is strongly based on hydroelectric energy, highly dependent on the water availability in the watersheds, and interconnected to minimize the failure risks. Inflow forecasting is an important tool for the reservoir operation within such system. Currently, the Brazilian System Operator (known as ONS) has been using stochastic models to subsidize their work, but such models have limited precision. Therefore, more efficient and robust ways to plan and operate the system are required (Hidalgo et al. 2012).
On the other hand, artificial neural networks (ANN) have been shown to be useful to forecast inflow time series, as used by Kisi (2007). An example of ANN application for reservoir operation on daily basis can be found in Farias et al. (2011), as well. Karunanithi et al. (1994) proposed an ANN-based daily inflow forecast model using an MLP feedforward network with sigmoid and linear activation functions to complete an inflow time series of a station in Huron River, Michigan. They had inflow records from two other sections in the same river and another in the main tributary. Inflows of current and past days were used as input to the model for forecasting the inflow for the next day. Cheng et al. (2005) compared the performance of a three-feed ANN feedforward with a conventional regression model and found that the proposed ANN performed better than the conventional regression model. They also used past inflows as input, which were normalized in the interval [1,1] in a network with four neurons in the hidden layer to predict again the inflow of the next day in the Manwan hydroelectric reservoir in China. The ONS could use ANN for daily inflow forecasting, but the chosen time series could contain noise, which could alter the final results and, therefore, the removal of such noises using some specific procedure could improve the forecasting results. The discrete wavelet transform (DWT) is a technique that could be used for such a task of signal filtering, and various attempts to couple ANN with DWT have been described, e.g., Adamowski and Sun (2010), Krishna et al. (2012), and Nourani et al. (2014b). Some authors also reported the presence of a lag between the observed and forecasted time series, especially in daily inflows forecasting (Dawson and Wilby 2001; Rajurkar et al. 2002; Jain and Srinivasulu 2004), and Santos and Silva (2014) has found that the use of DWT combined with ANN could eliminate such a lag as well.
Since the theoretical development by Grossmann and Morlet (1984), the wavelet analysis has been a leading technique in signal processing that has drawn attention. The main intent of this study is to associate the DWT technique to filter both daily inflow time series and rainfall time series to be used as ANN inputs. The DWT is used in this work to eliminate the high-frequency components (or details) of the input raw signals, due to the fact that the noises usually present in time series (e.g., inflow and rainfall data) might influence the forecasting quality as stated earlier. Wavelet analysis provides a useful means to decompose raw time series into low-frequency and high-frequency components. It is worthwhile to mention that the continuous wavelet transform has been amply described in several studies, which are used to address signal nonstationarity, periodicity, and so on (e.g., Santos et al. 2003, 2009, 2018) or even to determine homogeneous precipitation areas as reported by Santos and Morais (2013); however, the discrete wavelet transform is very suitable to filter signal data (Lian et al. 2011; Nourani 2014b), and it shows advantages over Fourier transform as highlighted by Santos et al. (2013).
Comparison also with the moving average technique can be found in Akrami et al. (2014) and Budu (2014), and an application of DWT with fuzzy logic can be found in Kim et al. (2014). Most of these works use the Daubechies mother-wavelet to decompose the time series and, presumably, the selected mother-wavelet might alter the model outcomes. Santos and Silva (2014) developed, for example, a wavelet–ANN coupled model for daily inflow forecasting into a different Brazilian reservoir, which instead of using DWT approximations A (or even details D) of inflows as input, their model uses combinations of DWT approximations A at several levels (e.g., A1+A2 or A2+A3, etc.), but they also used the Daubechies mother-wavelet. Nevertheless, the discrete Coiflet mother-wavelets presented here the best performance for the forecasting probably because (1) they were developed to have vanishing moments and scaling functions, (2) they have a near-symmetric shape (Santos et al. 2014; Seo et al. 2015), (3) their narrow shape is appropriate to well represent the hydrograph peaks; for much narrower mother-wavelet, such as Coiflet, the influence of zero padding is less affected by edge effects, and (4) there are few reports of their application in wavelet–ANN hybrid models (Nourani et al. 2014a), although they provide a high performance of wavelet–ANN hybrid models as will be shown here.
Different from previous studies, the novel hybrid model proposed is based in a new environment that starts from downloading 169 (13×13) 15-year-long daily TRMM rainfall time series, performing a cluster analysis, computing the respective Thiessen-weighted averages, decomposing the Thiessen-weight averages and inflow time series using Coiflet DWT, and building the ANN inputs with the approximation at level three (A3), instead of using details (D1, D2, and D3) which might save computation time and improve the results. With the decomposed signal, an input data set using five days of information (t,t1,,t4) is built to forecast the inflows to Três Marias reservoir 7 days ahead. Três Marias reservoir is one of the most important reservoirs of the Brazilian Interconnected Power System (396 MW), and its basin is as large as Denmark, for example. Finally, this paper presents the study area, the data, and a brief description of ANN and DWT. It details the novel hybrid model proposed and the methodology for the model evaluation, and then presents the results and conclusions.

Study Area and Data

The Alto São Francisco sub-basin is a strategic area for water resources management in Brazil. It is located in the upper part of the São Francisco River basin, has an area of 49,574  km2, which is larger than several countries (e.g., Denmark, Netherlands, Slovakia, and Switzerland), and has two distinct seasons: a rainy season (October–March) when more than 85% of the annual rainfall falls and a dry season (April–September). In the central western part, the annual rainfall can be around 2,000 mm while in the north-eastern part, the annual rainfall average is between 800 and 1,000 mm. The streamflow records used in this paper are the renaturalized daily inflows of Três Marias reservoir, located in an upper part of the São Francisco River basin (Fig. 1). The inflow records were obtained from the National System Operator, Brazil (ONS), which is the agency in charge of the operation of the Brazilian interlinked electric system, and has been working with inflow forecasting at a lead time of 7 days.
Fig. 1. Location of the São Francisco River basin and its sub-basin Alto do São Francisco in Brazil with hydrography and reservoirs.
To deal with the lack of continuous long-term daily rainfall time series in developing countries and over extensive and even otherwise inaccessible areas, the rainfall data were obtained from the Tropical Rainfall Measuring Mission (TRMM), which is a joint mission between two important space agencies, i.e., the National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA). This mission was designed to measure rainfall for climate and weather research. By covering the tropical and subtropical regions of the Earth, TRMM provides much-needed information on rainfall, and it has been useful to several studies (e.g. Plouffe et al. 2015; Teng et al. 2016). The daily rainfall records were obtained at each 0.25° from 46.75°W to 43.75°W, and from 21.00°S to 18.00°S (169 time series, i.e., 13  rows×13  columns). Each time series had 5,479 daily data records, referring to the period of January 1, 1998 to December 31, 2012. Table 1 shows the descriptive statistics for the daily TRMM rainfall depths over the studied basin as well as for the daily inflow records to Três Marias reservoir.
Table 1. Descriptive statistics for the daily TRMM rainfall (Thiessen-weighted averages) and daily inflows into Três Marias reservoir (1998–2012)
StatisticTRMM rainfallInflow
Arithmetic mean3.99 mm653.70  m3s1
Mode0.00 mm230.00  m3s1
Median0.35 mm388.00  m3s1
Harmonic mean0.00 mm312.65  m3s1
Geometric mean54.17 mm438.17  m3s1
Range60.37 mm4,659.00  m3s1
First quartile0.04 mm231.00  m3s1
Third quartile4.57 mm795.00  m3s1
Interquartile range4.53 mm564.00  m3s1
Median absolute deviation5.01 mm478.08  m3s1
Variance54.28  mm2448,609.60 (m3s1)2
Standard deviation7.37 mm669.78  m3s1
Coefficient of variation1.851.02
Skewness2.762.14
Kurtosis9.224.99
Excess kurtosis6.221.99

Methods

Rainfall Cluster Analysis

To get a representative precipitation data set, the 169 TRMM rainfall time series were analyzed according to a hierarchical cluster analysis (Cattell 1943). Hierarchical cluster analysis groups data over several scales by creating a cluster tree, which is known as dendrogram (Fig. 2). Here the agglomerative strategy for hierarchical clustering was used, which is a bottom up approach, i.e., each rainfall time series starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. In Fig. 2, the numbers along the horizontal axis correspond to the indices of the clusters in the original data set. The upside-down U-shaped lines represent the links between objects, and the height of such a U-shaped line indicates the distance between the objects. Thus, the link representing, for example, the cluster containing Clusters 6 and 10 has a height of 0.403; whereas, the link representing the cluster that groups Object 1 together with Objects 2, 3, 4, and 9 has a height of 0.508. The height represents the distance linkage computed between clusters, which is calculated here as a correlation distance dst (Székely and Rizzo 2014) between individuals s and t, constructed from values of n variables, translated by the vectors xs and xt, i.e., one minus the sample correlation between points (treated as sequences of values)
dst=1(xsx¯s)(xtx¯t)(xsx¯s)(xsx¯s)(xtx¯t)(xtx¯t)
(1)
where x¯s=1njxsj and x¯t=1njxtj, i.e., x¯s = average of each vector xsj; and x¯t = average of each vector xtj.
Fig. 2. Dendrogram for 10 clusters based on 169 TRMM daily rainfall time series, considering the correlation distance.
This is the task of grouping the time series in such a manner that ones in the same cluster (group) are more similar (in the sense of correlation distance) to each other than to those in other clusters (groups). It is a common technique for statistical data analysis, which is used in many fields, and can be considered as a main task in exploratory data mining. The tree is not a single set of groups but rather a multilevel hierarchy whereby groups at one level are joined as groups at the next level.
Such a procedure allows one to decide the level or scale of grouping that is most suitable for the application. Based on the size of the basin and on the mean annual rainfall distribution, the 169 rainfall time series were grouped into 10 clusters [Fig. 3(a)]. Finally, an average daily rainfall time series over each cluster was calculated based on the Thiessen method (Boots 1999), i.e., the TRMM grid point weights based on the relative areas of each measurement grid in the Thiessen polygon network were calculated [Fig. 3(b)]. The individual weights were multiplied by the grid TRMM rainfall estimates, and the values were summed to obtain the areal average precipitation time series for each cluster, which will be referred to as raw rainfall data.
Fig. 3. (a) Division of the basin based on cluster analysis (10 clusters); and (b) grid used to download the TRMM data and the relative areas of each measurement point in the Thiessen polygon network within the Alto do São Francisco subbasin.

Artificial Neural Networks

An artificial neural network (ANN) could be defined as a set of simple processing units (neurons), which work as a parallel distributed processor. Such neurons are responsible for storing experimental knowledge for further disposal. The biological nervous system inspired the ANNs, and they learn throughout examples, as the brains, and store the acquired knowledge in the connection weights among the neurons.
In the present study, a feed-forward network with one hidden layer was selected, in which the input data (x1,x2,,xn) are included in the first layer, and the network progressively processes those data throughout subsequent layers to produce the results (y1,y2,,yk) in the output layer. The input neurons are linked to those in the intermediate layer by wji weights (weight connecting the ith neuron in the input layer and the jth neuron in the hidden layer), and the neurons in the intermediate layer are linked to those in the output layer by wkj weights (weight connecting the kth neuron in the output layer and the jth neuron in the hidden layer). The ANNs, based on the nonlinear activation functions, map the relationship between the inputs and the output. Thus, the explicit correlation for the output values is expressed by
yk=fo(j=1swkj·fh(j=1swjixi+bj)+bk)
(2)
where fh = activation function of the nodes in the hidden layer; fo = activation function of the nodes in the output layer; s and s = number of nodes in the input and hidden layers, respectively; bj = bias for the jth hidden neuron; and bk = bias for the kth output neuron.
The ANN weights were obtained by calibration, when several sets of input–output are used in the ANN, and the weights are iteratively modified until the output values differ little from the target outputs. As the Levenberg–Marquardt (LM) algorithm is assumed to be one of the fastest methods for training ANNs, it was chosen here. The LM algorithm takes advantage of internal recurrence to dynamically incorporate past experience in the training process (Krishna et al. 2012).

Discrete Wavelet Transform

A discrete wavelet transform is a signal filtering technique that could also be used in raw hydrologic data. Usually, it is more reasonable to calculate only a subset of scales and positions to reduce the quantity of data if the wavelet coefficients are calculated at every possible scale. It turns out that if scales and positions based on powers of two are chosen then the analysis will be much more efficient and accurate. Thus, practical filters are used to fast decompose the signal into high-frequency and low-frequency components.
The decomposition process can be performed with successive approximations. Consequently, the original time series can be broken down into many lower-resolution components. Hence, in the filtering process, at the most basic level, the raw signal passes throughout two complementary filters (high-pass and low-pass) and turns into two signals, which are called details and approximations, whereby the details (D) are the low-scale/high-frequency components, and the approximations (A) are the high-scale/low-frequency components of the signal. One may note that the raw signals (Q and P) are formed by those A and D components. Fig. 4 shows exactly this aspect, up to five levels of decomposition, that the raw inflow and TRMM rainfall signals (Q and P) are formed by the sum of approximation (left side of Fig. 4) and details (right side of Fig. 4), which could be considered as the signal noise in some cases. Then, at the first level, the raw signal is decomposed in A1 and D1, e.g., Q=A1+D1. However, at the second level, Q is equal to the sum of A2, D2, and D1. Thus, as A1=A2+D2, A2=A3+D3,,An1=An+Dn, one may note that Q=A1+D1, Q=A2+D2+D1, Q=A3+D3+D2+D1,,Q=An+Dn+Dn1++D1. This approach produces the wavelet transform of the input data at all dyadic scales. However, rather than relying on an upsampling procedure, this relies more on downsampling, which is an excellent technique for denoising. Krishna et al. (2012) stated that, in general, DWT could be used to smooth and explore denoise time series to improve forecasting, and Quiroz et al. (2011) also used such a technique to reconstruct daily rainfall from rain-gauge data and the normalized difference vegetation index based on the fact that both signals are proportional and periodic.
Fig. 4. Wavelet transform reconstructed approximations in blue (A) and details in red (D) for five levels of decomposition of inflow and TRMM rainfall data (Thiessen-weighted averages).
There are several mother-wavelets that could be used in such a procedure of transformation. In this work, 22 mother-wavelets were tested: (1) Daubechies (db1, db2, db3, db4, db5, db6, db7, db8, db9, db10); (2) Symlets (sym2, sym3, sym4, sym5, sym6, sym7, sym8); and (3) Coiflets (coif1, coif2, coif3, coif4, coif5). Although all tested mother-wavelets improved the performance of the ANN models, the coif5 (N=5) presented the best results, presumably because the Coiflets are discrete wavelets with scaling functions and vanishing moments, they are near symmetric, and the wavelet functions have N/3 vanishing moments and the scaling functions have N/31, in which N is the wavelet index set here to 5. Thus, only the results of the hybrid models with coif5 were used to be compared with the regular ANN.

Hybrid Wavelet–ANN Models

Two models, called henceforth as regular ANN and WA-ANN, with three variants each, were built to forecast inflows seven days ahead because ONS has been working with such a lead time. Regular ANN model stands for the models that use as input the raw rainfall (Pt) of each cluster and/or raw inflow data recorded in the current day (Qt) and in the four previous days (Pt1, Pt2, Pt3, Pt4, Qt1, Qt2, Qt3, Qt4) while WA-ANN model stands for the models that use as input data the subseries of approximations (A) of those raw data obtained through a multiresolution analysis (considered also as filtered signal). Several quantities of antecedent days were tested, but it is noted that when increasing them to more than four days, there was a loss in computer time performance and no improvement was observed in the results. The decomposition was performed in 10 levels (A1,D1,,A10,D10), for each situation, and the correlation between the raw data (inflow and precipitation) and their approximations and details were respectively calculated (Table 2), as well as the correlation between the observed inflows and the calculated ones using the inflow approximations (A1,,A10) as input data for the WA-ANN model (Table 3). For example, in Table 2, the first value of the first row (0.995) is the correlation between raw inflow data and its approximation A1 time series, and the fifth value of the fourth row (0.471) is the correlation between the raw rainfall data of Cluster #2 and its approximation A5, and the last value (0.007) is the correlation between raw rainfall data of Cluster #10 and its details D10. Thus, the approximation at level three was selected (A3) because it shows simultaneously a correlation greater than using level four (A4) of both inflow and rainfall data (Table 2) and the best results when using the transformed inflows as inputs of the WA-ANN (Table 3). Note also that the high-frequency components (details) present low correlation with the original signal (Table 2). Therefore, the proposed WA-ANN will use as input values the A3 of inflows and TRMM rainfall data for each cluster in the four previous days and in the current day (A3t,A3t1,A3t2,A3t3,A3t4). For illustration, Fig. 4 shows the approximations and details of the inflow and TRMM data up to level five. In both models (regular ANN and WA-ANN), the output layer will have only one neuron, which is the forecasted inflow seven days ahead (Qt+7).
Table 2. Correlation coefficients (r) between raw data (inflow and rainfall) and the respective 10 levels of approximation (A) and detail (D)
DataA/DCorrelation with the raw data (r)
Level
(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)
InflowA0.9950.9900.9700.9410.8990.8380.7430.3060.1980.148
D0.0980.1060.1980.2330.2770.3260.3900.6760.2510.086
Rainfall (Cluster no. 1 up to Cluster no. 10)A0.8510.7260.6160.5340.4750.4220.3870.0730.0410.007
A0.8510.7180.6020.5180.4710.4290.3980.0720.0300.015
A0.8630.7250.6120.5210.4700.4220.3900.0710.0260.007
A0.8600.7230.6160.5370.4850.4410.4080.0850.0360.013
A0.8550.7230.6110.5340.4800.4420.4120.0730.0330.022
A0.8520.7180.5960.5230.4730.4320.3990.0820.0250.015
A0.8380.6970.5820.5000.4440.4130.3820.0810.0300.021
A0.8390.7040.5950.5100.4620.4200.3900.0820.0260.009
A0.8180.6800.5620.4760.4200.3770.3450.0800.0440.010
A0.8160.6620.5340.4710.4240.3870.3560.0760.0270.016
D0.5260.4430.3840.3080.2440.2180.1660.3820.0660.040
D0.5250.4560.3920.3060.2160.1940.1610.3910.0660.014
D0.5050.4680.3890.3210.2240.2060.1610.3840.0660.019
D0.5110.4650.3780.3030.2300.2010.1660.3990.0760.023
D0.5180.4560.3870.2950.2350.1870.1600.4060.0650.007
D0.5230.4600.4000.2860.2220.1920.1640.3920.0750.006
D0.5460.4640.3830.2990.2290.1650.1580.3750.0610.010
D0.5440.4570.3760.3060.2170.1910.1530.3820.0700.016
D0.5750.4550.3820.2990.2250.1840.1500.3370.0720.040
D0.5780.4770.3910.2520.2050.1740.1510.3470.0690.007
Table 3. Correlation coefficients (r) between Qo and Qc during the training, validation, test processes, and considering all data for the WA-ANN model using as input the inflow approximations up to 10 levels (1998–2012)
DatasetCorrelation (r) between Qo and Qc using inflow approximation components
A1A2A3A4A5A6A7A8A9A10
Training0.8250.8750.9560.9470.9220.8660.8410.7050.5300.684
Validation0.8150.8640.9540.9350.9100.8600.8580.7420.5490.696
Test0.8380.8400.9380.9370.9240.8460.8630.7300.4740.695
All0.8260.8690.9530.9460.9210.8620.8470.7140.5240.687
The initial objective, when an ANN is designed, is in finding an optimal architecture that allows for capturing the relation among the input and output variables. However, there is not a general rule to indicate the quantity of neurons for the input and hidden layers. Usually, the number is achieved through a trial-and-error procedure. Therefore, the quantity of neurons in the hidden layer was increased from 2 to 30, with an increment of two unities, and based on the root-mean-square deviation of the training set, the quantity of neurons in the hidden layer was finally set in 20 neurons to keep the same architecture for the six analyzed models. Adamowski and Sun (2010) had used 22 neurons in their study to develop a hybrid wavelet transform and artificial neural network model for flow forecasting of nonperennial rivers in semi-arid basins.
Yonaba et al. (2010) tested the Elliott sigmoid, the bipolar sigmoid, and the tan-sigmoid transfer functions to propose multilayer perceptron artificial neural networks for multistep ahead inflow forecasting over five different river basins and lead times, and the results showed that the tan-sigmoid was the most pertinent transfer function for inflow forecasting. Also, the results confirmed that the universal approximation theorem, i.e., a linear transfer function, is suitable for the output layer. Thus, the tan-sigmoid function was selected as the activation function for the hidden neurons (fh), and a linear activation function (fo) for the output layer neuron. The model performance during the validation was evaluated by using the correlation coefficient
r=[(QoQo¯)(QcQo¯)]2(QoQo¯)2(QcQo¯)2
(3)
where Qo = observed inflow; Q¯o = mean of the observed inflow; and Qc = calculated inflow.
To forecast the inflows seven days ahead into the Três Marias reservoir, the re-naturalised daily inflows and rainfall data of 15 years (January 1, 1998 to December 31, 2012) were used, a total of 5,468 records for each time series (i.e., 5,479 days minus 7 days for forecasting and minus 4 days for input), of which 70% was used for training, 15% for validation, and 15% for testing. Thus, six models were built with different input data set, namely M1 using raw TRMM rainfall data as input, M2 using raw inflow, M3 using the approximation at level three (A3) of rainfall data, M4 using A3 of inflow, M5 using raw TRMM rainfall and raw inflow data, and M6 using A3 of TRMM rainfall and A3 of inflow data (see the two first columns in Table 4).
Table 4. Correlation coefficients (r) between Qo and Qc during the training, validation, test processes, and using all data for each model for Três Marias reservoir
ModelInput datar
TrainingValidationTestingAll
M1: ANNRaw rainfall0.7160.5700.6140.679
M2: ANNRaw inflow0.8220.7920.8320.819
M3: WA-ANNA3 of rainfall0.9660.9370.9230.953
M4: WA-ANNA3 of inflow0.9560.9540.9380.953
M5: ANNRaw rainfall and raw inflow0.8830.7500.7720.847
M6: WA-ANNA3 of rainfall and A3 of inflow0.9890.9740.9680.984

Performance Evaluation

Several statistics that describe the degree of similarity among the data forecasted by the model and those observed can be used to assess the model efficiency (Santos and Silva 2014). In this work, besides the correlation coefficient, five indices were used, the Nash–Sutcliffe model efficiency coefficient (NASH), the percent bias (PBIAS), the standard deviation (σ), the root-mean-square deviation (RMSD), as defined and discussed by Moriasi et al. (2007) and the mean absolute percentage error (MAPE) as defined by Renno et al. (2015)
NASH=1[(QoQc)2(QoQo¯)2]
(4)
PBIAS=(QoQc)(Qo)100
(5)
RMSD=1n[(QoQo¯)(QcQc¯)]2
(6)
MAPE=1n|QoQcQo|
(7)
where n = number of records in the time series; and Qc¯ = mean of the calculated inflow time series. The Nash–Sutcliffe efficiency is a normalized statistic that is used to determine the relative magnitude of the residual variance compared to the observed data variance. It indicates how the scatter plot for the observed and simulated data well fits the equality line. It ranges between and 1.0, and the optimal value is NASH=1. Values between 0.0 and 1.0 are usually acceptable levels of performance, whereas unacceptable levels of performance are for values less than 0.0, which means that the mean observed value is a better predictor than the simulated value. Pearson’s correlation coefficient (r) indicates the degree of collinearity between predicted and measured values. This coefficient ranges between 1 and 1, and if r is equal to 1 or 1, a perfect positive or negative linear relationship exists, whereas if r is equal to 0, no linear relationship exists. Percent bias measures the average tendency of the simulated values to be larger or smaller than the respective observed values. Then, the optimal value of PBIAS is 0.0%; therefore, the low-magnitude values would indicate an accurate forecasting. Thus, positive values would indicate model underestimation bias, whereas the negative values would indicate model overestimation bias. The root-mean-square deviation computes the standard deviation of the model prediction error. The smaller the RMSD value, the better the model performance; however, it has limited ability to clearly indicate poor model performance, and the mean absolute percentage error usually expresses forecast error as a percentage, and the smallest values show the best forecasting.

Results and Discussion

Fig. 5(a) shows the results for seven days ahead forecasting using the raw rainfall data (M1) during the training, validation, testing, and using the whole time series, i.e., correlation coefficients (r) equal to 0.716, 0.570, 0.614, and 0.679, respectively, and Fig. 5(b) shows the results using raw inflow data (M2) as input to ANN model (r=0.822, 0.792, 0.832, and 0.819). It is noted that the regular ANN using only inflow data showed a better performance than the one that uses only TRMM rainfall data as input.
Fig. 5. Scatter plot for the observed (Qo) and forecasted (Qc) daily inflow seven days ahead for the training, validation, and test processes, and considering all data, using as input the (a) raw rainfall time series (M1—regular ANN model); and (b) raw inflow time series (M2—regular ANN model).
The discrete wavelet analysis was performed on the time series (rainfall and inflow) using the multilevel one-discrete wavelet decomposition method as already described. Thus, the time series of rainfall and inflow were transformed using the Coiflet mother-wavelet with N=5 (coif5) to decompose the original signals into approximations (A) and details (D). The approximations at level three (A3) were used as the ANN input to perform the forecasts for seven days ahead, which results are plotted in Figs. 6(a and b). Fig. 6(a) shows the results for seven days ahead forecasting using the filtered rainfall data (M3) during the training, validation, testing, and using the whole time series, i.e., r equal to 0.965, 0.937, 0.923, and 0.953, respectively; and Fig. 6(b) shows the results using the filtered inflow data (M4) during the training, validation, testing, and using the whole time series, i.e., r, respectively, equal to 0.956, 0.954, 0.938, and 0.953, and it is possible to note that the performance between M3 and M4 are very close; however, all correlation coefficients increased when compared to M1 and M2 (regular ANNs).
Fig. 6. Scatter plot for the observed (Qo) and forecasted (Qc) daily inflow seven days ahead for the training, validation, and test processes, and considering all data, using as input the (a) approximation A3 of rainfall time series (M3—WA-ANN model); and (b) approximation A3 of inflow time series (M4—WA-ANN model).
The same results are shown in Fig. 7(a) for the simulation using raw rainfall and also the raw inflow data (M5), i.e., r equal to 0.883, 0.750, 0.772, and 0.847, respectively. Here, one can note that the inclusion of inflow data as input data improved the model performance. An improvement of around 25% (from r=0.679 to r=0.847). Fig. 7(b) shows those results using filtered rainfall and filtered inflow data (M6), i.e., r equal to 0.989, 0.974, 0.968, and 0.984, respectively. Here, an improvement of around 16% is also observed when comparing M5, which uses raw data, with M6, which uses filtered data (from r=0.847 to r=0.984).
Fig. 7. Scatter plot for the observed (Qo) and forecasted (Qc) daily inflow seven days ahead for the training, validation, and test processes, and considering all data, using as input the (a) raw rainfall and inflow time series (M5—regular ANN model); and (b) approximation A3 of rainfall and inflow time series (M6—WA-ANN model).
All the correlation coefficients for the forecasting using the original and transformed data are shown in Table 4. It can be noted that the correlation improved when using the denoised signals, i.e., there is an improvement of around 28% when using filtered precipitation data instead of raw data as input to the ANN as well as improvements of, respectively, 16% and 18% when using filtered inflow data and using simultaneously filtered rainfall and inflow data. The greatest improvement is of around 45% when comparing the regular ANN using only rainfall input data (M1), r=0.679, with the WA-ANN using filtered rainfall and inflow input data (M6), r=0.984.
Other coefficients were computed as well to assess the efficiency of the models as shown in Table 5, in which r is the correlation coefficient, NASH is the Nash–Sutcliffe model efficiency coefficient, PBIAS is the percent bias, σ is the standard deviation, RMSD is the root-mean-square deviation and MAPE is the mean absolute percentage error. The M6 showed the best overall results, whereas M1 presents the worst results, which can be easily analyzed by means of a Taylor diagram. The Taylor diagram (Taylor 2001) is given in Fig. 8, which provides a concise statistical summary of how well patterns match each other in terms of their correlation, their root-mean-square difference and the ratio of their standard deviations. The Taylor diagram is a graphical framework that allows a suite of variables from the developed models to be compared to reference data (target). Thus, it is possible to easily confirm that according to the analyzed coefficients (r, RMSD, σ, and NASH), the best performed model is M6 (WA-ANN), which used the filtered rainfall and inflow data as input to the ANN. Its MAPE’s value is also the smallest one. It can be observed how privileged the positions of the models are that used filtered time series (M3, M4, and M6) and how dislocated is the model M1, which used only raw rainfall data as input. Another important observation is that the model M3, which uses only the filtered rainfall data as input, showed a better performance than the model M2, which uses only the raw inflow data and is the most usual structure of regular ANN. This improvement can be computed as in an order of 16%. Thus, it can be noted that although the application of rainfall data is traditional, and usually it offers little contribution in the field of hydrologic variable forecasting, the removal of its high-frequency components D1, D2, and D3 improved the results, making the M3 even better than M2. One can note that the target time series (inflow to the Três Maria reservoir) is plotted with a correlation coefficient of 1.0 and a standard deviation equal to 669.78  m3s1 (Table 1).
Table 5. Correlation coefficient (r), Nash–Sutcliffe model efficiency coefficient (NASH), the percent bias (PBIAS), the standard deviation (σ), the root-mean-square deviation (RMSD) and the mean absolute percentage error (MAPE) between Qo and Qc considering all data for Três Marias reservoir (1998–2012)
ModelInput datarNASHPBIAS (%)σ (m3s1)RMSD (m3s1)MAPE (%)
M1: ANNRaw rainfall0.6790.4585.731447.047491.72367.788
M2: ANNRaw inflow0.8190.6710.501552.047384.06831.479
M3: WA-ANNA3 of rainfall0.9530.9090.244648.326202.63737.469
M4: WA-ANNA3 of inflow0.9530.9070.351634.126203.92614.256
M5: ANNRaw rainfall and raw inflow0.8470.7154.882559.340356.47640.567
M6: WA-ANNA3 of rainfall and A3 of inflow0.9840.9680.189664.372120.44012.538
Fig. 8. Taylor diagram for the simulation results (standard deviation, RMSD, correlation coefficient, and NASH) of the six models: M1 based on raw rainfall data, M2 based on raw inflow data, M3 based on A3 of rainfall data, M4 based on A3 of inflow data, M5 based on raw rainfall and inflow data, and M6 based on A3 of rainfall and inflow data, when compared to the observed inflow time series of Três Marias reservoir.
To evaluate the model performance to strengthen the technical content of the proposed modeling, 1-day up to 7-day ahead forecasting were also performed to both regular ANN with raw inflow time series as input (similar to model M2), which is the most usual structure of regular ANN, and WA-ANN with A3 of rainfall and inflow time series as input (similar to model M6), which is the best model structure among the six analyzed models. Table 6 shows the correlation coefficients (r) during the training, validation, test, and using all data for each forecasting. It can be observed that for all situations, the WA-ANN models showed a better performance. As the number of lead days increases, the correlation coefficient decreases for the case of regular ANN; however, for the case of WA-ANN, the correlation coefficient remains high and stable. Finally, Fig. 9 presents the hydrograph of the observed and forecasted inflows using the model M6, for the entire period from 1998 to 2012, from which it can be observed that the model fits well to the observed data, including the wet seasons (high peaks in the hydrograph). For example, the figure inset shows in detail the forecasting for the specific wet season from November 2011 to February 2012, in which the good adjustment can be confirmed.
Table 6. Correlation coefficients (r) between Qo and Qc during the training, validation, test processes, and using all data for Três Marias reservoir for 1-day up to 7-day ahead forecasting using raw inflow, and A3 of rainfall and inflow as ANN inputs
Number of daysInput datar
TrainingValidationTestingAll
1Raw inflow0.9770.9780.9790.977
2Raw inflow0.9560.9510.9460.953
3Raw inflow0.9250.9250.9090.922
4Raw inflow0.8960.8870.8720.891
5Raw inflow0.8640.8660.8360.860
6Raw inflow0.8460.8220.8050.836
7Raw inflow0.8220.7920.8320.819
1A3 of rainfall and A3 of inflow0.9780.9740.9710.976
2A3 of rainfall and A3 of inflow0.9820.9730.9680.978
3A3 of rainfall and A3 of inflow0.9870.9710.9790.983
4A3 of rainfall and A3 of inflow0.9910.9760.9720.985
5A3 of rainfall and A3 of inflow0.9870.9770.9680.983
6A3 of rainfall and A3 of inflow0.9880.9730.9690.983
7A3 of rainfall and A3 of inflow0.9890.9740.9680.984
Fig. 9. Hydrograph for Três Marias reservoir of observed and forecasted inflows using model M6, which has as input A3 of rainfall and inflow data (1998–2012). (Inset) Detail of the simulation for the wet season from November 2011 to February 2012.

Conclusions

The discrete wavelet transform was proposed here as a technique to remove the details (or high-frequency components) of the raw signal of the TRMM rainfall data and inflows of Três Marias reservoir, the upper part of São Francisco River, Brazil, to build a novel wavelet-artificial neural network hybrid model (WA-ANN). Then, six models (M1 to M6) were analyzed, and the following specific conclusions could be obtained:
1.
DWT procedure could remove the noises present in the selected time series to be used as inputs of an ANN model to forecast the inflows seven days ahead. It was shown that such a procedure could improve the forecasting results.
2.
Using the rainfall raw data (M1), the regular ANN presented for the validation a correlation coefficient of 0.570, but when the raw inflow data was introduced as input as well (M5), such correlation increased to 0.750. It is important to note here that when this second variable (inflow) was included in the regular ANN input set, an improvement of 32% was obtained for the correlation coefficient.
3.
However, with the filtering process until level three (A3) using the Coiflet mother-wavelet (M6), the results improved to 0.974 (validation), which means an improvement of 71%.
4.
According to the model performance indices (r, NASH, PBIAS, σ, RMSD, and MAPE) and by means of the Taylor diagram, the model M1 presented the worst efficiency, whereas models M2, M3, and M5 showed similar performance.
5.
Model M4, which uses the denoised inflow time series, presented a good performance (with NASH=0.909).
6.
Model M6 (using denoised rainfall and inflow time series) was the best overall (NASH=0.968). It is worthwhile to mention that its standard deviation (σ=664.372  m3s1) is also close to the standard deviation of the observed time series (σ=669.78  m3s1).
Thus, the Coiflet DWT showed itself to be an efficient technique to remove noises present in rainfall and inflow records, and the chosen approximation A3 reveals that the elimination of only D1, D2, and D3 details was enough to improve the forecasting. Such decomposition proved to be powerful to improve the model performance. For example, even the WA-ANN model M3, which uses only the A3 of rainfall data, provided a higher performance than the regular ANN model with raw inflow data (M2), which is the usual ANN configuration for short-term forecasting. This means that, for inflow forecasting, it is better to use filtered rainfall data than raw inflows as ANN inputs.

Acknowledgments

The financial support provided by the National Council for Scientific and Technological Development, Brazil (Grant No. 304213/2017-9 and 304540/2017-0), the re-naturalized daily inflows from the National System Operator, Brazil (ONS) and Tropical Rainfall Measuring Mission (TRMM) rainfall estimates are gratefully acknowledged. This study was also financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001.

References

Adamowski, J., and K. Sun. 2010. “Development of a coupled wavelet transform and neural network method for flow forecasting of non-perennial rivers in semi-arid watersheds.” J. Hydrol. 390 (1–2): 85–91. https://doi.org/10.1016/j.jhydrol.2010.06.033.
Akrami, S. A., A. El-Shafie, M. Naseri, and C. A. G. Santos. 2014. “Rainfall data analyzing using moving average (MA) model and wavelet multi-resolution intelligent model for noise evaluation to improve the forecasting accuracy.” Neural Comput. Appl. 25 (7–8): 1853–1861. https://doi.org/10.1007/s00521-014-1675-0.
Bennett, J. C., D. E. Robertson, P. G. D. Ward, H. A. P. Hapuarachchi, and Q. J. Wang. 2016. “Calibrating hourly rainfall-runoff models with daily forcings for streamflow forecasting applications in meso-scale catchments.” Environ. Modell. Software 76: 20–36. https://doi.org/10.1016/j.envsoft.2015.11.006.
Bertone, E., R. A. Stewart, H. Zhang, M. Bartkow, and C. Hacker. 2015. “An autonomous decision support system for manganese forecasting in subtropical water reservoirs.” Environ. Modell. Software 73: 133–147. https://doi.org/10.1016/j.envsoft.2015.08.008.
Boots, B. 1999. “Spatial tessellation.” In Vol. 1 of Geographic information systems: Principles and technical issues, edited by P. Longley, M. F. Goodchild, D. Maguire, and D. Rhind, 2nd ed., 503–526. New York: Wiley.
Budu, K. 2014. “Comparison of wavelet-based ANN and regression models for reservoir inflow forecasting.” J. Hydrol. Eng. 19 (7): 1385–1400. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000892.
Cattell, R. B. 1943. “The description of personality: Basic traits resolved into clusters.” J. Abnormal Soc. Psychol. 38 (4): 476–506. https://doi.org/10.1037/h0054116.
Cheng, C., K. Chau, Y. Sun, and J. Lin. 2005. “Long-term prediction of discharges in Manwan reservoir using artificial neural network models.” In Vol. 3498 of Advances in neural networks: Lecture notes in computer science, edited by J. Wang, X. F. Liao, and Z. Yi, 1040–1045. Berlin: Springer.
Dawson, C.W., R. L., Wilby. 2001. “Hydrological modelling using artificial neural networks.” Prog. Phys. Geog. 25 (1): 80–108. https://doi.org/10.1177/030913330102500104.
Farias, C. A. S., C. A. G. Santos, and A. B. Celeste. 2011. “Daily reservoir operating rules by implicit stochastic optimization and artificial neural networks in a semi-arid land of Brazil.” In Vol. 347 of Risk in water resources management, 191–197. Wallingford, UK: IAHS Publication.
Grossmann, A., and J. Morlet. 1984. “Decomposition of Hardy functions into square integrable wavelets of constant shape.” SIAM J. Math. Anal. 15 (4): 723–736. https://doi.org/10.1137/0515056.
Hidalgo, I., D. Fontane, M. Arabi, J. Lopes, J. Andrade, and L. Ribeiro. 2012. “Evaluation of optimization algorithms to adjust efficiency curves for hydroelectric generating units.” J. Energy Eng. 138 (4): 172–178. https://doi.org/10.1061/(ASCE)EY.1943-7897.0000074.
Jain, A., and S., Srinivasulu. 2004. “Development of effective and efficient rainfall–runoff models using integration of deterministic, real-coded genetic algorithms and artificial neural network techniques.” Water Resour. Res. 40: W04302. https://doi.org/10.1029/2003WR002355.
Karunanithi, N., W. J. Grenney, D. Whitley, and K. Bovee. 1994. “Neural networks for river flow prediction.” J. Comput. Civ. Eng. 8 (2): 201–220. https://doi.org/10.1061/(ASCE)0887-3801(1994)8:2(201).
Kim, Y., H. S. Shin, and J. Plummer. 2014. “A wavelet-based autoregressive fuzzy model for forecasting algal blooms.” Environ. Modell. Software 62: 1–10. https://doi.org/10.1016/j.envsoft.2014.08.014.
Kisi, O. 2007. “Streamflow forecasting using different artificial neural network algorithms.” J. Hydrol. Eng. 12 (5): 532–539. https://doi.org/10.1061/(ASCE)1084-0699(2007)12:5(532).
Krishna, B., Y. R. S. Rao, and P. C. Nayak. 2012. “Wavelet neural network model for river flow time series.” Proc. ICE–Water Manage. 165 (8): 425–439. https://doi.org/10.1680/wama.10.00092.
Lian, Q., L. Shen, Y. Xu, and L. Yang. 2011. “Filters of wavelets on invariant sets for image denoising.” Appl. Anal.: Int. J. 90 (8): 1299–1322. https://doi.org/10.1080/00036811.2010.490524.
Moriasi, D. N., J. G. Arnold, M. W. Van Liew, R. L. Bingner, R. D. Harmel, and T. L. Veith. 2007. “Model evaluation guidelines for systematic quantification of accuracy in watershed simulations.” Trans. Am. Soc. Agric. Biol. Eng. 50 (3): 885–900. https://doi.org/10.13031/2013.23153.
Nourani, V., A. H. Baghanam, J. Adamowski, and O. Kisi. 2014a. “Applications of hybrid wavelet–Artificial intelligence models in hydrology: A review.” J. Hydrol. 514 (6): 358–377. https://doi.org/10.1016/j.jhydrol.2014.03.057.
Nourani, V., A. H. Baghanam, A. Y. Rahimi, and F. H. Nejad. 2014b. “Evaluation of wavelet-based de-noising approach in hydrological models linked to artificial neural networks.” In Computational intelligence techniques in earth and environmental sciences, edited by T. Islam, P. K. Srivastava, M. Gupta, X. Zhu, and S. Mukherjee. Dordrecht, Netherlands: Springer.
Plouffe, C. C. F., C. Robertson, and L. Chandrapala. 2015. “Comparing interpolation techniques for monthly rainfall mapping using multiple evaluation criteria and auxiliary data sources: A case study of Sri Lanka.” Environ. Modell. Software 67: 57–71. https://doi.org/10.1016/j.envsoft.2015.01.011.
Quiroz, R., C. Yarlequé, A. Posadas, V. Mares, and W. W. Immerzeel. 2011. “Improving daily rainfall estimation from NDVI using a wavelet transform.” Environ. Modell. Software 26 (2): 201–209. https://doi.org/10.1016/j.envsoft.2010.07.006.
Rajurkar, M. P., U. C., Kothyari, and U. C., Chaube. 2002. “Artificial neural networks for daily rainfall–Runoff modelling.” Hydrol. Sci. J. 47 (6): 865–877. https://doi.org/10.1080/02626660209492996.
Renno, C., F. Petito, and A. Gatto. 2015. “Artificial neural network models for predicting the solar radiation as input of a concentrating photovoltaic system.” Energy Convers. Manage. 106: 999–1012. https://doi.org/10.1016/j.enconman.2015.10.033.
Santos, C. A. G., P. K. M. M. Freire, G. B. L. Silva, and R. M. Silva. 2014. “Discrete wavelet transform coupled with ANN for daily discharge forecasting into Três Marias reservoir.” Proc. Int. Assoc. Hydrol. Sci. 364: 100–105. https://doi.org/10.5194/piahs-364-100-2014.
Santos, C. A. G., P. K. M. M. Freire, and C. Torrence. 2013. “A transformada wavelet e sua aplicação na análise de séries hidrológicas [The wavelet transform and its application for hydrological time series analysis].” Revista Brasileira de Recursos Hídricos 18 (3): 271–280. https://doi.org/10.21168/rbrh.v18n3.p271-280.
Santos, C. A. G., C. O. Galvão, and R. M. Trigo. 2003. Vol. 278 of Rainfall data analysis using wavelet transform, 195–201. Wallingford, CT: IAHS Publication.
Santos, C. A. G., O. Kisi, R. M. Silva, and M. Zounemat-Kermani. 2018. “Wavelet-based variability on streamflow at 40-year timescale in the Black Sea Region of Turkey.” Arabian J. Geosci. 11 (8): 169. https://doi.org/10.1007/s12517-018-3514-6.
Santos, C. A. G., and B. S. Morais. 2013. “Identification of precipitation zones within São Francisco River basin (Brazil) by global wavelet power spectra.” Hydrol. Sci. J. 58 (4): 789–796. https://doi.org/10.1080/02626667.2013.778412.
Santos, C. A. G., B. S. Morais, and G. B. L. Silva. 2009. Vol. 333 of Drought forecast using artificial neural network for three hydrological zones in San Francisco river basin, 302–312. Wallingford, UK: IAHS Publication.
Santos, C. A. G., and G. B. L. Silva. 2014. “Daily streamflow forecasting using a wavelet transform and artificial neural network hybrid models.” Hydrol. Sci. J. 59 (2): 312–324. https://doi.org/10.1080/02626667.2013.800944.
Seo, Y., S. Kim, O. Kisi, and V. P. Singh. 2015. “Daily water level forecasting using wavelet decomposition and artificial intelligence techniques.” J. Hydrol 520: 224–243. https://doi.org/10.1016/j.jhydrol.2014.11.050.
Székely, G. J., and M. L. Rizzo. 2014. “Partial distance correlation with methods for dissimilarities.” Ann. Stat. 42 (6): 2382–2412. https://doi.org/10.1214/14-AOS1255.
Taylor, K. E. 2001. “Summarizing multiple aspects of model performance in a single diagram.” J. Geophys. Res. 106 (D7): 7183–7192. https://doi.org/10.1029/2000JD900719.
Teng, H., R. A. V. Rossel, Z. Shi, T. Behrens, A. Chappell, and E. Bui. 2016. “Assimilating satellite imagery and visible–near infrared spectroscopy to model and map soil loss by water erosion in Australia.” Environ. Modell. Software 77: 156–167. https://doi.org/10.1016/j.envsoft.2015.11.024.
Yonaba, H., F. Anctil, and V. Fortin. 2010. “Comparing sigmoid transfer functions for neural network multistep ahead streamflow forecasting.” J. Hydrol. Eng. 15 (4): 275–283. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000188.

Information & Authors

Information

Published In

Go to Journal of Hydrologic Engineering
Journal of Hydrologic Engineering
Volume 24Issue 2February 2019

History

Received: Nov 17, 2017
Accepted: Jul 10, 2018
Published online: Nov 21, 2018
Published in print: Feb 1, 2019
Discussion open until: Apr 21, 2019

Authors

Affiliations

Celso A. G. Santos [email protected]
Professor, Dept. of Civil and Environmental Engineering, Federal Univ. of Paraíba, 58051-900 João Pessoa, Paraíba, Brazil (corresponding author). Email: [email protected]
Paula K. M. M. Freire
Ph.D. Student, Dept. of Civil and Environmental Engineering, Federal Univ. of Paraíba, 58051-900 João Pessoa, Paraíba, Brazil.
Richarde M. da Silva
Associate Professor, Dept. of Geosciences, Federal Univ. of Paraíba, 58051-900 João Pessoa, Paraíba, Brazil.
Seyed A. Akrami, Ph.D.
Head of Research Division, Tabriz Heritage Enterprise Hentian Kajang, Jalan Reko, 43000 Kajang, Selangore, Malaysia; Dept. of Civil and Structural Engineering, Universiti Kebangsaan Malaysia, 43600 UKM, Bangi Selangor, Malaysia.

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share