Cyberattack Diagnosis in Water Distribution Networks Combining Data-Driven and Structural Analysis Methods

Rodríguez-Martínez, Claudia; Quiñones-Grueiro, Marcos; Llanes-Santiago, Orestes

doi:10.1061/JWRMD5.WRENG-5302

Open access

Technical Papers

Feb 25, 2023

Cyberattack Diagnosis in Water Distribution Networks Combining Data-Driven and Structural Analysis Methods

Authors: Claudia Rodríguez-Martínez [email protected], Marcos Quiñones-Grueiro [email protected], and Orestes Llanes-Santiago https://orcid.org/0000-0002-6864-9629 [email protected]Author Affiliations

Publication: Journal of Water Resources Planning and Management

Volume 149, Issue 5

https://doi.org/10.1061/JWRMD5.WRENG-5302

PDF

Abstract

Most scientific contributions addressing cybersecurity issues in water distribution networks (WDNs) propose detection systems without considering the location problem. A methodology for detection and location of cyberattacks in WDNs is proposed in this paper. Structural analysis and neural networks are effectively combined with the control chart adaptive exponential weighted moving average (AEWMA). The proposed detection and location framework requires only data from normal operating conditions and knowledge about the behavioral model of the system. The validity of the methodology was demonstrated with the widely known case study Battle of the Attack Detection Algorithms (BATADAL). The detection method detected all the attacks with a false positive rate (false alarm rate) below 5% and true positive rate (TPR) (i.e., the detection rate) higher than 95%. The location method presents consistent diagnosis results while guaranteeing that the district metering area under attack always is identified.

Practical Applications

Water distribution networks are critical infrastructure for a country because they ensure the distribution of a basic element for life. The current development of electronics and communication technologies has made it possible to achieve automated management of water distribution networks, which has led to better rates of efficiency in water management and control. However, that technological development and the fact that WDNs are critical infrastructures make water distribution networks central targets for cyberattacks that seek to interrupt this basic service, temporarily or permanently affecting production and services. In this paper, a methodology is proposed for the detection and localization of cyberattacks on water distribution networks using computational intelligence tools that do not require new technological investments. The proposed methodology works with the data that the supervision and control system obtain from the real process. The results of the application of the proposed methodology will allow managers to make appropriate decisions to avoid the effects that cybertacks can produce.

Introduction

Meaningful advances in information technologies and industrial computing in the last 2 decades have produced significant changes in the management of the water distribution networks (WDNs). Traditional management of physical infrastructure such as pipes, valves, and pumps in urban WDNs has evolved into cyber–physical systems which combine physical devices and processes with communication networks, smart devices, and software applications for supervision and control. This is the industrial internet of things (IIoT), or Industry 4.0, applied in the exploitation of hydraulic infrastructures. The main goal of this change is to improve the service quality to the users while minimizing water losses and negative impacts on the environment (World Bank 2016; Adedeji and Hamam 2020).

Cyber–physical systems improve the service of urban WDNs but expose them to potential threats of cybernetic attacks (Adepu et al. 2020; Rasekh et al. 2016; Taormina et al. 2017). In the last few years, several water distribution and supply systems have experienced cyberattacks (Clark et al. 2017; Berglund et al. 2020; Hassanzadeh et al. 2020; Tuptuk et al. 2021). This has motivated the creation of international associations and cybersecurity agencies for protecting and defending water distribution networks.

In 2016, with the aim of promoting the development of new strategies and computational tools for cyberattack detection and location in WDNs, an international competition named Battle of the Attack Detection Algorithms (BATADAL) was created. The main objective of the competition was to compare the performance of different algorithms in the detection of cyberattacks. The C-Town WDN was designed as a case study. It is a real-scale WDN of medium size which is operated using programmable logic controllers (PLCs) and a supervisory control and data acquisition (SCADA) system. The performance of the computational algorithms presented in the competition was evaluated in terms of the latency (time for detection) and the percentage of correct detection, among other performance metrics (Taormina et al. 2018a).

A bibliographic review of cybersecurity in WDN showed that most research focuses on cyberattack detection (Abokifa et al. 2019; Ahmed et al. 2017; Chandy et al. 2019; Housh and Ohar 2018; Quiñones Grueiro et al. 2019; Ramotsoela et al. 2019; Taormina et al. 2018a). Three recently published articles which provided extensive and detailed reviews of issues related to cybersecurity in WDNs confirm the precceding statement (Berglund et al. 2020; Shapira et al. 2021; Tuptuk et al. 2021). Nowadays, cyberattack detection is based on the identification of anomalies in the behavior of the measured variables. Advanced methods use signal spectral analysis or the comparison of the ideal behavior of the WDN (using a parameterized physical model) with its current state (Ahmed et al. 2017; Housh and Ohar 2018; Quiñones Grueiro et al. 2019; Taormina et al. 2018a). Most approaches to cyberattack detection involve techniques used in the fault diagnosis field. In contrast, few works are concerned with the location of the attacks in WDNs. Amin et al. (2013) proposed the use of a bank of delay-differential observer systems for detection and isolation of attacks on a WDN. Taormina and Galelli (2018) proposed a scheme based on autoencoders (AEs) trained with data of normal operating conditions to detect anomalous patterns. The localization process is developed by using the autoencoder reconstruction error, but part of this process is not carried out in automatic form and requires expert analysis.

This paper porposes a novel methodology for detection and location of cyberattacks in WDNs combining tools of computational intelligence and structural analysis, which represents the main contribution of the paper. In the context of WDNs, the problem of detecting cyberattacks has been overly studied in the last 5 years. However, little attention has been paid to the problem of finding the location or area where the attack is taking place. In this sense, the proposed methodology is applicable to real-world systems because it requires only (1) nominal or normal measurement data for training the autoencoders and adjusting the parameters of the adaptive exponential weighted moving average (EWMA) statistical test (widely available from most water companies), and (2) an understanding of the physical configuration of the district metered areas (DMAs) with the sensors and actuators therein to define the equations to be used for structural analysis, which form a behavioral model of the network (not requiring a parameterized physical model). For the validation of the proposal, the C-Town and E-Town case studies were used.

The organization of the paper is as follows. Section “Materials and Methods” presents the principal characteristics of computational tools used and the structural analysis. Section “Methodology for Detection and Location of Cyberattacks” describes, the proposed methodology. Sections “Application of the Proposed Methodology to the C-Town Case Study” and “Application of the Proposed Methodology to the E-Town Case Study” apply the proposed methodology to the C-Town and E-Town case studies, respectively. The obtained results are analyzed and discussed. Finally, the conclusions and recommendations for future works are presented.

Materials and Methods

The computational tools used in the methodology, the metrics used to evaluate its performance, and the case study are presented in this section.

Autoencoders

Autoencoders are deep neural networks capable of learning a compressed representation of the input space and minimizing the loss of information, expressed as a reconstruction error or distortion function. AEs work in two steps: (1) an encoding function maps the input space onto a reduced feature representation; and (2) a decoding function reconstructs the original input space using the learned feature space as inputs (Baldi 2012). AEs intrinsically learn a set of nonlinear relationships among the input variables mapped onto the reduced space. Therefore, they can be used to determine when these relationships are not consistent for new input data.

AEs were defined formally by Baldi (2012) based on the following elements: (1) a set of

m

vectors

X = {x_{1}, \dots, x_{m}}

representing an input pattern, where

x_{i} \in ℜ^{n}

; (2) a function

A : ℜ^{n} \to ℜ^{p}

that represents the encoder, where

n > p

; (3) a function

B : ℜ^{p} \to ℜ^{n}

that represents the decoder; and (4) a distortion function

Δ

(e.g., the L2 norm) is defined in

ℜ^{n}

which measures the distance between the output pattern of the decoder and the input pattern.

AEs transform an input vector

x_{i} \in ℜ^{n}

into an output vector

A ° B (x_{i}) \in ℜ^{n}

. Training an AE implies learning the weights of the encoding and decoding functions by minimizing a distortion function formalized through the following general optimization problem

\min E (A, B) = \min_{A, B} \sum_{i = 1}^{m} Δ (A ° B (x_{i}), x_{i})

(1)

where ° is the composition operator indicating that vectors of the input space are transformed by functions A (encoder) and B (decoder).

The most common distortion function considered in Eq. (1) is the squared prediction error calculated between the input and the output vectors across all samples of the training data set. In addition, different modifications of this optimization problem have been proposed in recent years to enforce certain constraints on the learned functions. Therefore, two regularization terms are included as follows:

J = \frac{1}{n} \sum_{j = 1}^{n} \sum_{i = 1}^{k} {(x_{i j} - {\hat{x}}_{i j})}^{2} + λ \times Ω_{weights} + β \times Ω_{sparsity}

(2)

where

Ω_{weights}

= L2 weight decay penalty term;

Ω_{sparsity}

= sparsity penalty term;

λ

= weight decay parameter, which controls the relative importance of the second term of the cost function; and

β

is a parameter that controls the weight of the sparsity penalty term. More details about how these penalty terms are calculated were given by Ng (2010) and Makhzani and Frey (2016).

Given a data set of observations obtained during nominal conditions, the training process for an AE is as following:

•

Tune parameters dimension of the feature space (

p

), and activation functions for the encoder and decoder (

λ, β

). Different methods can be employed for this purpose. A grid-search experiment was conducted in this work for this task.

•

Minimize function Eq. (2) to obtain the encoding and decoding functions using a stochastic gradient-based optimization method.

After the encoding and decoding functions are learned, they can be used to calculate a reconstruction error for new observations. This error then can be used as an indicator of anomalous behavior, as is described in the next subsection.

Adaptive Exponential Weighted Moving Average Chart

Adaptive exponential weighted moving average chart is an advanced version of the univariate control chart called exponentially weighted moving average (Roberts 1959). Control charts are used to determine when a signal is presenting atypical deviations from its regular values. Therefore, AEWMA was used in this work to determine when the output squared prediction error of an AE of a new observation does not conform with nominal operating conditions.

EWMA is a univariate control chart first used for detecting deviations in the mean of a signal. For statistical testing, it provides better results than the Shewhart chart (Wheeler 2000) in the detection of small shifts because it takes into consideration not only the immediate observation but also the previous values. The EWMA chart differs from the cumulative sum (CUSUM) chart (Hawkins and Olwell 1998) because it considers a weighting factor

γ \in ℜ \to [0, 1]

which allows an easy adjustment of the shift sensitivity to be detected (Montgomery 2013). The EWMA of a signal

x (t) \in ℜ

is defined as

z (t) = γ \bar{x} (t) + (1 - γ) z (t - 1)

(3)

where

\bar{x}

= mean value of signal

x

.

The starting value is the mean target value

z (1) = μ

. For

γ = 1

, EWMA is similar to a Shewhart control chart. For larger values of

γ

, the recent observation is more important, and therefore the chart is more sensitive to large abrupt shifts. Small values of the weighting factor give more importance to older observations, which allows for better detection of small shifts, but with an increase of false alarms. The EWMA tool can detect either small or large shifts; however, it cannot work satisfactorily by detecting both simultaneously. The adaptive EWMA solves this difficulty because it adapts the weight of the past observations by taking into account the magnitude of the error

e (t) = \bar{x} (t) - z (t - 1)

such that

γ (e (t)) = {ϕ [e (t)]} / [e (t)]

(Capizzi and Masarotto 2003).

In this paper, the following score function was used (Capizzi and Masarotto 2003; Aly et al. 2015):

ϕ (e) = {\begin{array}{c} e + (1 - γ) k & i f & e < - k \\ γ e & i f & | e | \leq k \\ e - (1 - γ) k & i f & e > k \end{array}

(4)

were

0 \leq γ \leq 1

and

k \geq 0

are constants.

An anomaly is detected when

z (t)

exceeds the control limits

μ \pm h σ

, where

μ

and

σ

are the mean and the standard deviation parameters of a reference signal, and

h

is chosen to achieve a desired performance. To avoid false alarms, the average of the run length (ARL) of the control chart is used as a performance measure in the design of the AEWMA control chart (Aly et al. 2015). It indicates the number of samples taken until the control chart presents a false alarm. Several methods can be used to determine the design parameters

γ

and

k

given a shift interval to be detected in the mean of the variable

[δ_{\min}, δ_{\max}]

. In this paper, those parameters were calculated as in Aly et al. (2015)

γ = \ln (1.2219 - 0.04697 \times \ln (ARL) + 0.45985 \times \sqrt{δ_{\min}} - 0.02701 \times \sqrt{δ_{\max}})

(5)

k = \sqrt{4.846 + 1.5852 \times \ln (ARL) - 2.8679 \times \sqrt{δ_{\min}} - 1.7198 \times \sqrt{δ_{\max}}}

(6)

Structural Analysis

Structural analysis is a model-based methodology for fault diagnosis of industrial processes. The structure model of a system represents an abstraction of its behavioral model that permits establishing analytical redundancy relationships (ARRs) with the goal of detecting and locating faults. The number of ARRs that can be defined depends on the measured variables (known variables) and the structure of a system. ARRs represent a set of constraints or rules evaluated during system operation by using the variables calculated from the model and the measurements obtained from the system. A single ARR is defined by a subset of variables whose relationship must remain consistent if the system behaves normally. Therefore, if a fault occurs, one or several ARRs will not be consistent. The advantage of this methodology lies in the possibility of analyzing the detectability and isolability of different faults that can be present in a system, without requiring knowledge about all the parameters of its analytical model (Blanke et al. 2006; Düstegör et al. 2006).

In this paper, structural analysis was used in the methodology to locate cyberattacks in WDNs. In the design of a fault diagnosis system in large complex processes, it is a common practice to divide it into subprocesses or subsystems. In this case, ARRs are established for each subsystem by using the observable variables affected by the presence of a cyberattack. Large WDNs commonly are divided into district metered areas (DMAs) (Quiñones Grueiro et al. 2019; Kadosh et al. 2020; Quiñones Grueiro et al. 2021); therefore, a set of ARRs can be established for each DMA in order to distinguish which one is under attack. If the set of ARRs corresponding to a DMA is not consistent during network operation, then a cyberattack is considered to be present in the respective DMA.

The main theoretical fundament of structural analysis applied to detection and location of cyberattacks is now presented. The behavior model of a system can be defined by a pair

(C, V)

where

V = {v_{1}, v_{2}, \dots, v_{n}}

represents a set of variables, and

C = {c_{1}, c_{2}, \dots, c_{m}}

represents a set of constraints. The set of variables

V = K \cup X

, where

K

represents the subset of measurable or known variables and

X

represents the subset of nonmeasurable or unknown variables.

Definition 3.1:

Structural model

The structural model of the system

(C, V)

is a bipartite graph,

(C, V, E)

where

E \subset C \times V

is a set of edges defined by (Blanke et al. 2006)

e_{i j} \in E = {\begin{cases} (c_{i}, v_{j}) & if variable v_{j} appears in constraint c_{i} \\ 0 & in other cases \end{cases}

(7)

A bipartite graph has an associated incidence matrix. The rows and columns of that matrix represent the set of constraints and variables, respectively. To represent each edge

e_{i j} \in E

in the incidence matrix, a symbol

•

should be placed in the intersection of row

c_{i}

and column

v_{j}

.

Definition 3.2:

Subsystem

A subsystem is defined by a subset of constraints

C_{l} \subset C

together with the set of variables

V_{l} \subseteq V

(where

l = 1, 2, \dots, p

, where

p

is the number of subsystems) that are related to these constraints.

In structural analysis, the possibility of establishing ARRs implies that the graph has more constraints than unknown variables (Krysander et al. 2008). By applying the canonical Dulmage–Mendelsohn decomposition (Krysander and Frisk 2008) to the graph (

ℳ

), the graph can be divided into three parts: the structurally overconstrained part

ℳ^{+}

, which has more constraints than unknown variables; the just-determined part

ℳ^{0}

, which has the same number of constraints and unknown variables; and the structurally underconstrained part

ℳ^{-}

, which has fewer constraints than unknown variables. Fig. 1 presents a generic incidence matrix after a Dulmage–Mendelsohn decomposition is applied. The cyberattacks that affect the constraints belonging

ℳ^{0}

and

ℳ^{-}

are not detectable.

Fig. 1. Generic incidence matrix after Dulmage–Mendelsohn decomposition is applied.

To obtain the minimal overconstrained subsystems, the algorithm presented by Krysander et al. (2008) was used in this work.

Blanke et al. (2006) provided detailed examples, including an example of a tank system in which the inflow is controlled based on a level sensor and an electric pump, and outflow is realized through an output pipe.

Methodology for Detection and Location of Cyberattacks

Experience in WDN management indicates that the way to deal with cyberattacks should be similar to that used to solve leaks. That is, the service in the DMA should be interrupted to avoid other possible consequences of the cyberattack, while maintaining the service in the rest of the DMAs if possible. This implies that the most important objective is to locate the DMA under attack, and for this reason, each DMA should be identified as an area of interest (AOI). However, it is possible that experts can decide to define more areas of interest than DMAs, taking into account their knowledge in the management of the WDN as well as the distribution and extension of each DMA.

The methodology proposed for the detection and location of cyberattacks in this paper is shown in Fig. 2. The proposal is conformed by two stages: offline, and online.

Fig. 2. Methodology proposed for detection and location of cyberattacks.

Offline Stage

In the offline stage, the following steps describe the methodology:

•

Detection calibration: A single AE is trained following the steps described in Section “Materials and Methods” considering as input all measured variables obtained by the SCADA under nominal or normal operating conditions. The squared prediction error then is calculated for the nominal data, and the parameters of the AEWMA control chart are tuned based on the desired low false-positive rate, aiming to have a small number of false alarms.

•

Localization calibration: The AOIs for the WDN are defined. The set of relationships among variables of each AOI in the WDN is used as input for the structural analysis method to determine the respective ARRs that characterize each AOI. As mentioned previously, each ARR comprises a subset of variables whose relationship must remain consistent under normal conditions. Therefore, for each ARR, a single AE is trained, and an AEWMA control chart is calibrated to determine when the ARR is not consistent, i.e., the relationship among the variables is not normal. The steps described in Section “Materials and Methods” are followed, considering the variables that characterize the ARR as inputs. Then the squared prediction error is calculated for the nominal data corresponding to these variables, and the parameters of the respective AEWMA control chart are tuned.

If new sensors are added to the WDN, or if the network is expanded with new branches, the steps described in this section should be repeated.

Online Stage

In the online stage, when a new observation is obtained by the SCADA system, the following steps describe the methodology:

•

Attack detection: Every new vector of variables measured is used as input for the AE previously trained with all variables. The squared prediction error is calculated and used as input for the AEWMA control chart to determine whether there is an attack in the whole WDN.

•

Attack localization: When an attack is detected, it can be characterized by the inconsistency of one or several ARRs simultaneously. Thus, each area of interest is analyzed separately to assess whether it is under attack. For this purpose, the consistency of each ARR corresponding to the AOI is verified. If all ARRs are inconsistent for a DMA (the AEWMA control chart of each AE—one AE for each ARR—evaluates each input observation as anomalous), then it is considered to be under attack. To avoid false alarms due to outlier observations, the number of continuous observations that should violate the normal operating condition and the time interval in which they must be obtained in order to establish the presence of a cyberattack should be defined by experts, as suggested by Kadosh et al. (2020).

Application of the Proposed Methodology to the C-Town Case Study

In this section, the methodology proposed in the section “Methodology for Detection and Location of Cyberattacks” was applied to the C-Town case study. Two experiments were developed. The aim of the first experiment was to evaluate the performance of the detection module under nominal conditions as well as under the effect of different magnitudes of background noise affecting the measurements. The goal of the second experiment was to evaluate the performance in the location module. The experiments used the toolbox presented by Frisk et al. (2017).

Case Study: C-Town WDN and BATADAL Data Sets

C-Town is a medium-sized network designed to resemble the operations of real-world WDNs. The network consists of a single reservoir, 7 storage tanks, 429 pipes, 11 pumps distributed across 5 pumping stations (S1–S5), 388 junctions, and 5 valves. Pumps, valves, and level sensors of the tanks are connected to nine programmable logic controllers, which form a cybernetwork together with a central computer in which a SCADA system coordinates the operations through the PLCs (Taormina et al. 2018a).

BATADAL data sets were introduced for the Battle of the Attack Detection Algorithms: Disclosing Cyber Attacks on Water Distribution Networks (Taormina et al. 2018a). Three data sets were generated using the simulation package EPANET 2 including information of 43 variables sampled at fixed hourly intervals. These variables were the measurements obtained from level sensors corresponding to each storage tank (7 variables), inlet and outlet pressure for 1 actuated valve and 5 pumping stations (12 variables), as well as the water flow through them and their on or off status (24 variables). Of these variables, 31 are continuous and 12 are binary, corresponding to the status of the valve and pumps. Data sets 1 and 2 can be used for training the detection algorithms. Data set 1 was generated by simulating the operation of the C-Town WDN during 365 days without the presence of cyberattacks. This data set allows studying the operations of the WDN under nominal conditions. Data set 2 contains information of 7 attacks produced during a period of 87 days during which the WDN was attacked for about 492 h. Data set 3 contains information of 7 additional attacks produced during a period of 71 days during which the WDN was attacked for about 407 h, and it should be used to test the performance of the detection algorithms after training. A complete characterization of Data sets 2 and 3 was given by Taormina et al. (2018a).

Detection and Localization in the Offline Stage

Data representing nominal or normal operating conditions and comprising 8,753 observations were used to train the autoencoder for the detection module. The maximum number of epochs in the training was 3,000. The RMS error (RMSE) function was used as a performance or cost metric. After a grid search experiment, the parameters that obtained the best performance ware

p = 0.99

,

λ = 0.0001

, and

β = 1

; a latent space dimension of 23 variables; and sigmoid and linear activation functions for the encoder and decoder, respectively. To adjust the hyperparameters of the AEWMA anomaly detection chart, the analysis presented by Aly et al. (2015) was considered. The parameters were selected to ensure a low false-alarm rate and the detection of incipient and abrupt anomalies (

ARL = 400

,

δ_{\min} = 0.5

, and

δ_{\max} = 5

). Furthermore, by evaluating Eqs. (5) and (6),

γ = 0.102013

and

k = 2.984267

were obtained. Further details about AEWMA hyperparameter setting were given by Quiñones Grueiro et al. (2019).

The first step for attack location is to establish the set of areas of interest based on the experience in the WDN management. The C-Town WDN has 5 DMAs, and the first idea was to identify each DMA with an AOI. However, DMA 1 contains two very important subareas: Pumping Station 1; and Valve 2, which controls the distribution of water to DMA 2 and DMA 3. Therefore, two AOIs were established for DMA 1. Fig. 3(a) shows the five DMAs in the C-Town case study and the associated AOIs, and Fig. 3(b) shows the two AOIs defined in DMA 1.

Fig. 3. Areas of interest (AOIs) defined in the C-Town case study: (a) five DMAs defined in C-Town; and (b) two AOIs defined in DMA 1.

In the second step, structural analysis is used to determine the set of ARRs corresponding to each AOI to locate attacks. The following variables were available: the level (

h

) of each tank (

t

), the flow rate (

q

), the inlet and outlet pressure (

p

), the status of each pump (

p

), and the flow and status of Valve (

v

) 2. In the development of the structural analysis, three subsets of variables were defined: Attacks (AOI

n

, where

n = 1, 2, \dots, 6

), observable variables (designated with the letter y preceding the name), and system variables, the actual values of which are unknown. The set of equations that describe the behavior of the system is given in Table 1. The Appendix presents the identification of the variables used in the structural analysis.

Table 1. Equations defined for structural analysis

Equation No.	Variables involved
$e_{1}$	${q p 1, q p 2, q 1, q t 1, h 1, AOI 1}$
$e_{2}$	${q 1, p 1 a, p 1 d}$
$e_{3}$	${q 1, q t 1, q 2, q 5}$
$e_{4}$	${q 5, q 6, q 7}$
$e_{5}$	${q 2, p v a, p v d, q t 2, h 2, A o I 6}$
$e_{6}$	${q 2, q t 2, q 3, q 4}$
$e_{7}$	${q 3, p 2 a, p 2 d, q t 4, h 4, A o I 2}$
$e_{8}$	${q 3, q t 4}$
$e_{9}$	${q 4, p 3 a, p 3 d, q t 3, h 3, A o I 3}$
$e_{10}$	${q 4, q t 3}$
$e_{11}$	${q 6, p 4 a, p 4 d, q t 5, h 5, A o I 4}$
$e_{12}$	${q 6, q t 5}$
$e_{13}$	${q 7, p 5 a, p 5 d, h 6, h 7, A o I 5}$
$e_{14}$	${q 7, q t 6, q t 7}$
$e_{15}$	${y q p 1, q p 1}$
$e_{16}$	${y q p 2, q p 2}$
$e_{17}$	${y q 2, q 2}$
$e_{18}$	${y q 3, q 3}$
$e_{19}$	${y q 4, q 4}$
$e_{20}$	${y q 6, q 6}$
$e_{21}$	${y q 7, q 7}$
$e_{22}$	${y h 1, h 1}$
$e_{23}$	${y h 2, h 2}$
$e_{24}$	${y h 3, h 3}$
$e_{25}$	${y h 4, h 4}$
$e_{26}$	${y h 5, h 5}$
$e_{27}$	${y h 6, h 6}$
$e_{28}$	${y h 7, h 7}$
$e_{29}$	${y p 1 a, p 1 a}$
$e_{30}$	${y p 1 d, p 1 d}$
$e_{31}$	${y p v a, p v a}$
$e_{32}$	${y p v d, p v d}$
$e_{33}$	${y p 2 a, p 2 a}$
$e_{34}$	${y p 2 d, p 2 d}$
$e_{35}$	${y p 3 a, p 3 a}$
$e_{36}$	${y p 3 d, p 3 d}$
$e_{37}$	${y p 4 a, p 4 a}$
$e_{38}$	${y p 4 d, p 4 d}$
$e_{39}$	${y p 5 a, p 5 a}$
$e_{40}$	${y p 5 d, p 5 d}$

The obtained isolability matrix of the structural model with performance in mixed causality showed that it is possible to isolate all defined attacks in a unique way. Structural analysis allows designing the residual generators for each attack by considering a set of overconstrained equations. In this case, seven analytical redundant residuals (ARRs) weare established (Fig. 4). These ARRs were combined for the detection of cyberattacks in each of the six established AOIs. For example, to locate a cyberattack in AOI 2, it is necessary that ARR 4 and ARR 5 become nonconsistent at the same time. To detect a cyberattack in AOI 3, ARR 3 and ARR 5 should become nonconsistent at the same time.

For each ARR, an autoencoder was trained. In the training process, 8,753 samples of normal operation of the WDN were used, but only with the variables present in each ARR. Furthermore, the parameters of the AEWMA control charts associated with each autoencoder were established using grid search. A set of 2,089 observations of Data set 3 of the C-Town case study was used in the analysis of the performance of the AEWMA control charts because it contained data of attacks on each AOI except to AOI 4.

Performance Assessment

Two important characteristics must be satisfied by a system for detection of cyberattacks: (1) the early and reliable detection of an attack; and (2) its correct classification. To evaluate the early detection of an attack, index

S_{TTD}

is defined

S_{TTD} = 1 - \frac{1}{n_{a}} \sum_{i = 1}^{n_{a}} \frac{{TTD}_{i}}{Δ t_{i}}

(8)

where

n_{a}

= number of attacks contained in data set;

{TTD}_{i}

= time to detection of

i

th attack; and

Δ t_{i}

= corresponding duration time. The index to evaluate the performance in the classification process is defined as follows:

S_{CLF} = \frac{TPR + TNR}{2}

(9)

where TPR and TNR = true positive rate and true negative rate respectively

TPR = \frac{TP}{TP + FN}, TNR = \frac{TN}{TN + FP}

(10)

where TP = true positive alarms; FN = false negative alarms; TN = true negative alarms; and FP = false positive alarms. Both indexes are integrated into a global performance index

S_{GP} = ς \times S_{TTD} + (1 - ς) \times S_{CLF}

(11)

where

ς

determines relative importance of indexes

S_{TTD}

and

S_{CLF}

in the general index

S_{GP}

. This paper took

ς = 0.5

to give the same weight to early detection and correct classification. Finally, the false alarm rate (FAR) index is calculated as

FAR = 1 - TNR

.

Cyberattack Detection

The performance of the detection system was evaluated using 2,089 observations from Data set 3 of the C-Town case study which contains the information about seven cyberattacks. The results of the detection system are presented in Fig. 5, which shows the behavior of the

z (t)

variable corresponding to the AEWMA control chart of the detection system, and the limit of the AEWMA control chart for the normal operation of the WDN (

μ + h σ

). Vertical lines indicate the start and the end time of each cyberattack. The seven attacks were detected because in all cases the limit of the AEWMA control chart in the time interval in which each attack occurs was violated. Fig. 5 also indicates the period in which false alarms occurred.

Fig. 5. Detection of cyberattacks for Data set 3.

Index

S_{TTD} = 0.9451

was calculated using Eq. (8). Most false positive observations occurred after an attack had finished due to the inertia problem, which is characteristic of the AEWMA control chart (Fig. 5). Furthermore, only one persistent false attack, with a duration of 11 observations, was detected. The observations of the data set were classified as follows: TP = 350, TN = 1612, FP = 70, and FN = 17. Using these values and the expressions of

TP R, TN R

, and FAR, the following values were obtained:

TNR = 0.9584

,

FAR = 0.0416

,

TPR = 0.9582

,

S_{CLF} = 0.9583

, and

S_{GP} = 0.9517

.

Ramotsoela et al. (2019) compared 14 schemes for cyberattack detection that were based on machine learning using the information of the BATADAL data set. Seven of these schemes were the strategies presented in the BATADAL competition (Housh and Ohar 2017; Abokifa et al. 2017; Giacomoni et al. 2017; Brentan et al. 2017; Chandy et al. 2017; Pasha et al. 2017; Aghashahi et al. 2017). They are identified herein as B1–B7, where the number indicates the position in the ranking in the BATADAL competition. Of the remaining seven strategies, the local outlier factor (LOF) (Breuning et al. 2000) and subspace outlier degree (SOD) (Kriegel et al. 2009) are density-based algorithms; the strategy based on Mahalanobis distance (MD) (Leys et al. 2018) is a parametric method; and discriminant analysis strategy (Shmueli et al. 2017) in its two variants [linear (LDA) and quadratic (QDA)], the one-class support vector machine (OSVM) strategy (Khan and Madden 2014), and the ensemble technique strategy (which combines both SOD and LOF using QDA) are classification algorithms. Table 2 presents the performance of these 14 detection strategies and the proposal made in this paper (AE-SA) in descending order.

Table 2. Cyberattack detection ranking

Rank	Name	$S_{TTD}$	$S_{CLF}$	$S_{GP}$
1	B1	0.9650	0.9752	0.9701
2	AE-SA	0.9451	0.9583	0.9517
3	QDA	0.9584	0.9422	0.9503
4	B2	0.9580	0.9402	0.9491
5	Ensemble	0.9400	0.9529	0.9464
6	MD	0.9297	0.9387	0.9342
7	B3	0.9360	0.9174	0.9267
8	LOF	0.9229	0.9286	0.9258
9	SOD	0.9091	0.9223	0.9157
10	B4	0.8570	0.9313	0.8942
11	B5	0.8350	0.7679	0.8015
12	LDA	0.7959	0.7532	0.7745
13	B6	0.8850	0.6605	0.7727
14	OSVM	0.7383	0.8060	0.7721
15	B7	0.4290	0.6398	0.5344

These results indicate the effectiveness of the proposed detection system, which ranked second. This position has more relevance considering that only data from nominal operating conditions are required, compared with the other algorithms that need a parameterized physical model or data from the attacks (Ramotsoela et al. 2019). Although it can take hours to detect an attack, because of the system inertia, it takes many hours before an attack significantly perturbs a water distribution network (depending on the number of consumers that it serves). For example, in the case of the network used for BATADAL, it can take about 10 h for an attack to drive a reservoir beyond its normal regime, and almost 2 days are needed to cause the tank to overflow (Taormina et al. 2016).

Kadosh et al. (2020) presented comparable detection performance, with

S_{GP} = 0.954

, as well as an experiment demonstrating the robustness of their method to five levels of background noise. The noise added to each sensor variable was normally distributed around zero with different values of the coefficient of variation (COV). The standard deviation of the noise for signal

v

was calculated as

σ = COV μ

, where

μ

is the mean of the measured variable in the attack-free data set. For each level of noise, 20 random samples were generated and the average performance obtained with the approach proposed in this work is shown in Fig. 6. The performance deteriorated significantly for high levels of background noise. The main reason is that, conversely to the approach presented by Kadosh et al. (2020), the proposal presented in this work does not use attack data to adjust the hyperparameters of the detection methods. For most real WDNs, only nominal data are available. Therefore, the proposed approach remains valid.

Fig. 6. $S_{GP}$ performance index for different background noise levels in the measurements.

Cyberattack Location

Table 3 presents the elements of the WDN and the area of interest affected by each of the seven attacks.

Table 3. Elements and area of interest affected in each attack

Attack No.	Elements affected	AOI number
1	Level of Tank 3 (h3), Flow of Pumps 4 and 5 (q4)	3
2	Level of Tank 2 (h2), Flow of Valve 2 (q2)	6
3	Flow of Pump 3 (qp2)	1
4	Flow of Pump 3 (qp2)	1
5	Level of Tank 2 (h2), Flow of Valve 2 (q2), Pressure in Valve 2 (pv)	6
6	Level of Tank 7 (h7), Flow in Pumps 10 and 11 (q7)	5
7	Level of Tank 6 (h6)	2

To locate an attack satisfactorily, the observations during the attack should make nonconsistent the ARRs that characterize the area. However, four different options can occur when an observation detected as an attack is analyzed: (1) only the real AOI is activated; (2) several AOIs are activated, and the real AOI is included in them; (3) one or several AOIs are activated and the real AOI is not included; and (4) no AOI is activated. In this case study, the 70 observations classified as FP in the detection experiment did not make nonconsistent any ARR. Table 4 presents the other results obtained by the cyberattack location system. The first column (A) presents the attack number. The second column (B) presents the number of observations during the presence of each attack. The third column (C) presents the number of observations that activate at least one AOI. Columns D, E, F, and G present the number of observations that satisfy Options 1, 2, 3, and 4, respectively. Column LR presents the location rate [percentage of observations that activate at least one AOI with respect to the total number of observations during the presence of the attack

(D / B) \times 100

]. Column ExLR shows the exact location rate, i.e., the percentage of observations that activate only the real AOI with respect to the number of observations that activate at least one AOI during an attack [

(D / C) \times 100

]. Finally, Column EfLR presents the effective location rate, i.e., the percentage of observations that activate the real AOI with respect to the number of observations that activate at least one AOI [

(D + E) / C

]. The most relevant aspects of the analysis of the results in Table 4 are the following:

•

The LR index was 65.11%. Cyberattack 6 was the most problematic in terms of location rate. This suggests the necessity to incorporate new measured variables in this AOI to obtain more sensibility in the detection of attacks. For example, flow sensors at the output of the tanks could help to improve the location of attacks in this area, although adding more sensors increases the potential number of false positives and false alarms, as well as the cost of maintenance and calibration for the network.

•

Option 3 never occurred. This indicates that if one or some AOIs are activated, the real AOI always is included.

•

The ExLR index was 85.28%. In this case, the worst performance was obtained for Attack 5, in which during 23 observations, several AOIs activate simultaneously.

•

The EfLR index was 100%, which is relevant because it indicates that the real AOI always is activated for all observations that activate at least one AOI. From the point of view of the operators, this result is very important because if the location process indicates several AOIs, the operators can be sure that at least one of them is the real AOI. If operators simultaneously analyze which AOI has been activated more consistently during the attack, they can identify the real AOI under attack. Fig. 7 shows the activation of the AOIs during Attack 5. The real AOI attacked was AOI 6, and that was the area that the location system indicated to be the real attack area during most of the duration of the attack.

Table 4. Results of cyberattack location process for C-Town

A	B	C	D	E	G	$LR = (C / B) \times 100$	$ExLR = (D / C) \times 100$	$EfLR = (D + E) / C$
1	70	44	41	3	26	62.85	93.18	100
2	65	48	41	7	17	73.84	85.41	100
3	31	29	29	0	2	93.54	100	100
4	31	31	29	2	0	100	93.54	100
5	100	74	51	23	26	74	68.91	100
6	80	23	19	4	57	28.75	82.60	100
7	30	16	16	0	14	53.33	100	100
Total	407	265	226	39	142	65.11	85.28	100

Note: A = attack number; B = number of observations during each attack; C = number of observations that activate at least one AOI; D = number of observations that satisfy Option 1; E = number of observations that satisfy Option 2; F = number of observations that satisfy Option 3; G = number of observations that satisfy Option 4; LR = location rate, i.e., percentage of observations that activate at least one AOI with respect to total number of observations during an attack [ $(D / B) \times 100$ ]; ExLR = exact location rate, i.e., percentage of observations that activate only the real AOI with respect to number of observations that activate at least one AOI during an attack [ $(D / C) \times 100$ ]; and EfLR = effective location rate [percentage of observations that activate the real AOI with respect to the number of observations that activate at least one AOI [ $(D + E) / C$ ].

Fig. 7. Results of the cyberattack location system during Attack 5.

Application of the Proposed Methodology to the E-Town Case Study

In this section, the methodology proposed in the section “Methodology for Detection and Location of Cyberattacks” was applied to the E-Town case study. Similar to the previous case study, attack detection and location experiments were conducted. The experiments used the toolbox presented by Frisk et al. (2017).

Case Study: E-Town WDN and Attack Data Set

E-Town was inspired in a real WDN located in a city in Colombia. The model was presented in the competition called Battle of Water Networks District Metered Areas (BWNDMA), the sixth competition developed as part of the International Conference on Water Distribution System Analysis (WDSA) in 2016 (Saldarriga et al. 2019). E-Town is a large-scale and complex WDN with 5 water sources, 17 tanks, 14 control valves, 3 pump stations, 11,107 nodes, and 13,920 pipes. The goal of the competition was to define a new distribution for the DMAs. The proposal presented by Salomons et al. (2017) in the competition suggested dividing the network into 23 DMAs and adding a set of valves, making a total of 41 valves (Fig. 8).

Fig. 8. E-Town with DMA distribution. (Adapted from Salomons et al. 2017.)

Attack detection and location analysis requires the generation of a data set based on the WDN considering nominal and different attacks. For this purpose, Kadosh et al. (2020) used the epanetCPA toolbox (Taormina et al. 2018b). Flow sensors in the valves and the pumps were considered. Pressure sensors were assumed to be installed before and after each pump and valve. Pressure sensors associated with each tank also were considered. Sensors were associated with programmable logic controllers based on proximity. In this work, the data sets generated by Kadosh et al. (2020) comprising 8,761 observations under nominal conditions and 8,762 observations of 20 attacks in different DMAs were used for evaluation purposes.

Detection and Location in the Offline Stage

Unlike C-Town, it is not feasible to define a single autoencoder for attack detection in the entire E-Town network. Instead, an Autoencoder was created for each of the three zones in Fig. 8. Zone 1 included DMAs 1, 2, 3, 4, 5, and 6; Zone 2 included DMAs 8, 9, 10, and 11; and Zone 3 included DMAs 7, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23ab, and 23cd. DMA 23 is divided into two parts. Grid search was used to tune the hyperparameters of each autoencoder and AEWMA chart.

For location purposes, structural analysis was applied to obtain a set of ARRs associated with each AOI. For E-Town, an AOI was defined for each DMA except DMA 23, for which two AOIs were considered. The respective ARRs obtained are shown in Figs. 9–11.

Fig. 9. Attack sensitivity matrix of Zone 1.

Fig. 10. Attack sensitivity matrix of Zone 2.

Fig. 11. Attack sensitivity matrix of Zone 3.

Cyberattack Detection and Location

Using Eq. (8),

S_{TTD} = 0.9443

was calculated. The performance indicators obtained were

TNR = 0.9762

,

TPR = 0.947

, and

S_{CLF} = 0.9616

were obtained. The performance obtained was similar to that for C-Town, with a slight decrease of the TPR probably because a larger number of attacks was considered. Moreover, the false positives mostly were generated after each attack. These results demonstrate that the proposed detection methodology scales well to large networks. Table 5 presents the results obtained by the cyberattack location system.

Table 5. Results of cyberattack location process for E-Town

A	B	C	D	E	G	$L R$	$E x R L$	$E f L R$
1	31	30	30	0	1	96.77	100	100
2	41	39	39	0	2	95.12	100	100
3	101	100	93	7	1	92.80	93	100
4	101	98	98	0	3	97.03	100	100
5	101	89	13	76	12	88.12	14.61	100
6	101	89	0	76	12	88.12	0	85.39
7	101	97	61	36	4	96.04	62.89	100
8	101	101	0	79	0	100	0	78.22
9	101	101	53	48	0	100	52.48	100
10	51	49	49	0	2	96.08	100	100
11	101	95	21	64	6	94.06	22.11	89.47
12	51	49	48	0	2	96.08	97.96	97.96
13	21	16	13	2	5	76.19	81.25	93.75
14	151	148	147	0	3	98.013	99.32	99.32
15	51	49	49	0	2	96.08	100	100
16	51	51	50	0	0	100	98.04	98.04
17	151	137	11	83	14	90.73	8.03	68.61
18	71	67	43	22	4	94.37	64.18	97.02
19	81	79	0	79	2	97.53	0	100
20	91	89	0	89	2	97.80	0	100
Total	1,650	1,573	818	661	77	95.33	52.00	94.02

Note: A = attack number; B = number of observations during each attack; C = number of observations that activate at least one AOI; D = number of observations that satisfy Option 1; E = number of observations that satisfy Option 2; F = number of observations that satisfy Option 3; G = number of observations that satisfy Option 4; LR = location rate, i.e., percentage of observations that activate at least one AOI with respect to total number of observations during an attack [ $(D / B) \times 100$ ]; ExLR = exact location rate, i.e., percentage of observations that activate only the real AOI with respect to number of observations that activate at least one AOI during an attack [ $(D / C) \times 100$ ]; and EfLR = effective location rate, i.e., percentage of observations that activate the real AOI with respect to the number of observations that activate at least one AOI [ $(D + E) / C$ ].

Analysis of the data inTable 5 led to several conclusions. Attacks 1, 2, 4, 10, 12, 14, 15, and 16 were detected uniquely and correctly. Attack 3 was detected in the correct DMA for 94% of its duration, and in four different DMAs simultaneously (22, 6, 3, and 1) for the remaining 6%. Attacks 5 and 6 occurred simultaneously, and they correctly were located in DMAs 21 and 20. Attacks 7 and 8 overlapped for some time. When they did not overlap in time, they correctly were located in DMAs 18 and 16. However, they were located in DMA 17 when they occurred simultaneously. Attack 11 was located in both DMA 22 and 20, although it affected only DMA 22. Similarly, Attacks 13 and 17 were located in two DMAs at the same time. The performance for Attack 18 was the worst in terms of location, because it was located simultaneously in five DMAs. Although some attacks were located in more than one DMA at the same time, the true DMA almost always was included among the group of DMAs identified.

Conclusions

This paper presents a methodology for detection and location of cyberattacks in a water distribution network. The proposal is based on the combination of structural analysis and two computational intelligence tools: autoencoders, and AEWMA control charts. The first advantage of the proposed methodology is that it needs only data corresponding to the nominal operation of the WDN for training. The second advantage is that in addition to detecting an attack, it can determine the affected area. To evaluate the detection process, the indexes of early detection

S_{TTD}

, performance in classification

S_{CLF}

, and global performance

S_{GP}

were defined. The obtained results for the C-Town and E-Town case studies demonstrated the attack detection capability of the first part of the proposed methodology. With respect to the location process, very satisfactory results were obtained for the indexes exact location rate and effective location rate. This ensures the possibility of always considering the real attack area among the areas proposed by the location methodology. In general, all these results confirm the validity of the proposal.

For future work, it is necessary to investigate data processing and assimilation techniques to improve the robustness of the proposed approach to noise, outliers, and operational changes not caused by cyberattacks. Moreover, it would be useful to define a systematic approach to determine additional variables that should be measured to obtain better attack detection and location results. Extending the proposed approach to guarantee the location of simultaneous attacks also is worth researching. Finally, it is worth studying the development of methods to distinguish cyberattacks from other operational condition changes which may alter the relationships among variables, such as component degradation or faults.

Appendix. Structural Analysis Variables

Structural analysis variable	Description	BATADAL variable
p1a	Pressure before inlet pumps to DMA 1	P1 (J820)
p1d	Pressure after inlet pumps to DMA 1	P2 (J269)
p2a	Pressure before inlet pumps to DMA 2	P5(J289)
p2d	Pressure after inlet pumps to DMA 2	P6(J415)
p3a	Pressure before inlet pumps to DMA 3	P3(J300)
p3d	Pressure after inlet pumps to DMA 3	P4(J256)
p4a	Pressure before inlet pumps to DMA 4	P8(302)
p4d	Pressure after inlet pumps to DMA 4	P9(206)
p5a	Pressure before inlet pumps to DMA 5	P10(207)
p5d	Pressure after inlet pumps to DMA 5	P11(317)
pv	Pressure in the valve	PV(J422)
h1	Water level, Tank 1	L_T1
h2	Water level, Tank 2	L_T2
h3	Water level, Tank 3	L_T3
h4	Water level, Tank 4	L_T4
h5	Water level, Tank 5	L_T5
h6	Water level, Tank 6	L_T6
h7	Water level, Tank 7	L_T7
qp1	Flow through Pump PU1	F_PU1
qp2	Flow through Pump PU2	F_PU2
q1	Flow through Node J269	—
q2	Flow that passes through Valve 2 and influences the level in Tank 1	F_V2
q3	Flow through Pumps 6 and 7 (Pumps 6 and 7 alternate: when one is on, the other one is off), and this influences the level in Tank 4	F_PU6 + F_PU7
q4	Flow through Pumps 4 and 5 (Pumps 4 and 6 alternate: when one is on, the other one is off), and this influences the level in Tank 3	F_PP4 + F_PU5
q5	Flow that feeds Flows 6 and 7	—
q6	Flow through Pumps 8 and 9 (pumps 8 and 9 alternate: when one is on, the other one is off), and this influences the level in Tank 5	F_PU8 + F_PU9
q7	Flow passing through Pumps 10 and 11 (Pumps 10 and 11 alternate: when one is on, the other one is off), and this influences the level of Tanks 6 and 7	F_PU10 + F_PU11
pva	Pressure before the valve	—
pvd	Pressure after the valve	—
qtn	Flow out of tank $n$ , where $n = 1$ , 2, 3, 4, 5, 6, 7.	—
$AOI m$	Attack on Area of Interest $m$ , where $m = 1$ , 2, 3, 4, 5, 6	—

Note: It is necessary to define three types of variables for structural analysis: known variables, which are related to measured variables, unknown variables, which are to with the actual value of the variables of interest to develop the structural analysis; and the case attacks.

The measured variables are identified with the letter y in the variable name.

Known Variables: yqp1, yqp2, yq2, yq3, yq4, yq6, yq7, yh1, yh2, yh3, yh4, yh5, yh6, yh7, yp1a, yp1d, ypva, ypvd, yp2a, yp2d, yp3a, yp3d, yp4a, yp4d, yp5a, yp5d.

Unknown Variables: qp1, qp2, q1, q2, q3, q4, q5, q6, q7, qt1, qt2, qt3, qt4, qt5, qt6, qt7, h1, h2, h3, h4, h5, h6, h7, p1a, p1d, pva, pvd, p2a, p2d, p3a, p3d, p4a, p4d, p5a, p5d.

Attack Variables: AOI1, AOI2, AOI3, AOI4, AOI5, AOI6

Data Availability Statement

The data that support the findings of this study in the C-Town WDN case study are openly available in BATADAL repository at https://www.batadal.net/data.html. The E-Town WDN case study used the data sets generated by Kadosh et al. (2020). For the structural analysis, the library at https://faultdiagnosistoolbox.github.io/ was used with the parametrization indicated in the paper. For the autoencoders, the code of the function trainAutoencoder available (https://www.mathworks.com/help/deeplearning/ref/trainAutoencoder.html) was used. A detailed guide to reproducing the experiments developed in the investigation and two zipped files with all MATLAB scripts and the functions used in them are available at https://github.com/rmclaudia/cyberattacks_wdn.git.

Reproducible Results

Reviewer Ayman Nassar was able to reproduce all figures and results presented in the article.

Acknowledgments

Authors Claudia Rodrguez Martnez and Orestes Llanes-Santiago acknowledge the financial support provided by Project No. PN223LH004-023, National Program of Research and Innovation ARIA from the Ministry of Science, Technology and Environment (CITMA), Cuba.

References

Abokifa, A., K. Haddad, C. Lo, and P. Biswas. 2017. “Detection of cyber physical attacks on water distribution systems via principal component analysis and artificial neural networks.” In World environmental and water resources congress, 676–691. Reston, VA: ASCE.

Crossref

Google Scholar

Abokifa, A., K. Haddad, C. Lo, and P. Biswas. 2019. “Real-time identification of cyber-physical attacks on water distribution systems via machine learning–based anomaly detection techniques.” J. Water Plann. Manage. 145 (1): 04018089. https://doi.org/10.1061/%28ASCE%29WR.1943-5452.0001023.

Google Scholar

Adedeji, K., and Y. Hamam. 2020. “Cyber-physical systems for water supply network management: Basics, challenges, and roadmap.” Sustainability 12 (22): 9555. https://doi.org/10.3390/su12229555.

Google Scholar

Adepu, A., V. Palleti, G. Mishra, and A. Mathur. 2020. “Investigation of cyber attacks on a water distribution system.” In Vol. 12418 of Proc., Applied Cryptography and Network Security Workshops. ACNS 2020, 274–291. Zürich, Switzerland: Springer.

Google Scholar

Aghashahi, M., R. Sundararajan, M. Pourahmadi, and M. Banks. 2017. “Water distribution systems analysis symposium– BATtle of the Attack Detection ALgorithms (BATADAL).” In World environmental and water resources congress, 101–108. Reston, VA: ASCE.

Crossref

Google Scholar

Ahmed, C., C. Murgia, and J. Ruths. 2017. “Model-based attack detection scheme for smart water distribution networks.” In Proc., 2017 ACM on Asia Conf. on Computer and Communications Security, 101–113. Abu Dhabi, United Arab Emirates: Association for Computing Machinery.

Google Scholar

Aly, A., R. Hamed, and M. Mahmoud. 2015. “Optimal design of the adaptive exponentially weighted moving average control chart over a range of mean shifts.” Commun. Stat. - Simul. Comput. 46 (2): 890–902. https://doi.org/10.1080/03610918.2014.983650.

Google Scholar

Amin, S., X. Litrico, S. Sastry, and A. Bayen. 2013. “Cyber security of water SCADA systems—Part II: Attack detection using enhanced hydrodynamic models.” IEEE Trans. Control Syst. Technol. 21 (5): 1679–1693. https://doi.org/10.1109/TCST.2012.2211874.

Google Scholar

Baldi, P. 2012. “Autoencoders, unsupervised learning, and deep architectures.” In Proc., ICML Workshop on Unsupervised and Transfer Learning 2012 June 27, 37–49. Bellevue, WA: JMLR Workshop and Conference Proceedings.

Google Scholar

Berglund, E., J. Pesantez, A. Rasekh, M. Shafiee, L. Sela, and T. Haxton. 2020. “Review of modeling methodologies for managing water distribution security.” J. Water Resour. Plann. Manage. 146 (8): 03120001. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001265.

Google Scholar

Blanke, M., M. Kinnaert, J. Lunze, and M. Staroswiecki. 2006. Diagnosis and fault-tolerant control. Singapore: Springer.

Google Scholar

Brentan, B., E. Campbell, G. Lima, D. Manzi, D. Ayala-Cabrera, M. Herrera, I. Montalvo, J. Izquierdo, and L. E. Jr. 2017. “On-line cyber attack detection in water networks through state forecasting and control by pattern recognition.” In World environmental and water resources congress, 583–592. Reston, VA: ASCE.

Crossref

Google Scholar

Breuning, M., H.-P. Kriegel, R. Ng, and J. Sander. 2000. “Lof: Identifying density-based local outliers.” ACM SIGMOD Rec. 29 (2): 93–104. https://doi.org/10.1145/342009.335388.

Crossref

Google Scholar

Capizzi, G., and G. Masarotto. 2003. “An adaptive exponentially weighted moving average control chart.” Technometrics 45 (3): 199–207. https://doi.org/10.1198/004017003000000023.

Google Scholar

Chandy, S., A. Rasekh, Z. Barker, B. Campbell, and M. Shafiee. 2017. “Detection of cyber-attacks to water systems through machine-learning-based anomaly detection in scada data.” In World environmental and water resources congress, 611–616. Reston, VA: ASCE. https://doi.org/10.1061/9780784480625.057.

Google Scholar

Chandy, S., A. Rasekh, Z. Barker, and M. Shafiee. 2019. “Cyberattack detection using deep generative models with variational inference.” J. Water Resour. Plann. Manage. 145 (2): 04018093. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001007.

Google Scholar

Clark, R., S. Panguliri, T. Nelson, and R. Wyman. 2017. “Protecting drinking water utilities from cyberthreats.” J. AWWA 109 (2): 50–58. https://doi.org/10.5942/jawwa.2017.109.0021.

Google Scholar

Düstegör, D., E. Frisk, V. Cocquempot, M. Krysander, and M. Staroswiecki. 2006. “Structural analysis of fault isolability in the DAMADICS benchmark.” Control Eng. Pract. 14 (6): 597–608.

Crossref

Google Scholar

Frisk, E., M. Krysander, and D. Jung. 2017. “A Toolbox for analysis and design of model based diagnosis system for large scale models.” IFAC-PapersOnLine 50 (1): 3287–3293. https://doi.org/10.1016/j.ifacol.2017.08.504.

Google Scholar

Giacomoni, M., N. Gatsis, and A. Taha. 2017. “Identification of cyber attacks on water distribution systems by unveiling low-dimensionality in the sensory data.” In World environmental and water resources congress, 660–675. Reston, VA: ASCE.

Crossref

Google Scholar

Hassanzadeh, A., A. Rasekh, S. Galelli, M. Aghashahi, R. Taormina, A. Ostfeld, and M. Banks. 2020. “A review of cybersecurity incidents in the water sector.” J. Environ. Eng. 146 (5): 03120003. https://doi.org/10.1061/(ASCE)EE.1943-7870.0001686.

Google Scholar

Hawkins, D. M., and D. H. Olwell. 1998. Cumulative sum charts and charting for quality improvement. New York: Springer.

Crossref

Google Scholar

Housh, M., and Z. Ohar. 2017. “Model-based approach for cyber-physical attack detection in water distribution systems.” In World environmental and water resources congress, 727–736. Reston, VA: ASCE.

Crossref

Google Scholar

Housh, M., and Z. Ohar. 2018. “Model-based approach for cyber-physical attack detection in water distribution systems.” Water Res. 139 (Aug): 132–143. https://doi.org/10.1016/j.watres.2018.03.039.

Google Scholar

Kadosh, N., A. Frid, and M. Housh. 2020. “Detecting cyber-physical attacks in water distribution systems: One-class classifier approach.” J. Water Resour. Plann. Manage. 146 (8): 04020060. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001259.

Google Scholar

Khan, S., and M. Madden. 2014. “One-class classification: Taxonomy of study and review of techniques.” Knowl. Eng. Rev. 29 (3): 345–374. https://doi.org/10.1017/S026988891300043X.

Google Scholar

Kriegel, H.-P., P. Kröeger, E. Schubert, and A. Zimek. 2009. “Outlier detection in axis-parallel subspaces of high dimensional data.” In Vol. 5476 of Proc., Pacific-Asia Conf. on Knowledge Discovery and Data Mining, 831–838. Berlin: Springer.

Google Scholar

Krysander, M., J. Aslund, and M. Nyberg. 2008. “An efficient algorithm for finding minimal overconstrained subsystems for model based diagnosis.” IEEE Trans. Syst. Man Cybern. Part A Syst. Humans 38 (1): 197–206. https://doi.org/10.1109/TSMCA.2007.909555.

Google Scholar

Krysander, M., and E. Frisk. 2008. “Sensor placement for fault diagnosis.” IEEE Trans. Syst. Man Cybern. Part A Syst. Humans 38 (6): 1398–1410. https://doi.org/10.1109/TSMCA.2008.2003968.

Google Scholar

Leys, C., O. Klein, Y. Dominicy, and C. Ley. 2018. “Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance.” J. Exp. Social Psychol. 74 (Jan): 150–156. https://doi.org/10.1016/j.jesp.2017.09.011.

Google Scholar

Makhzani, A., and B. Frey. 2016. “k-sparse autoencoders.” Preprint, submitted December 19, 2013. https://arxiv.org/abs/1312.5663v2.

Google Scholar

Montgomery, D. 2013. Introduction to statistical quality control. Hoboken, NJ: Wiley.

Google Scholar

Ng, A. 2010. “Sparse Autoencoder.” Accessed February 14, 2023. https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf.

Google Scholar

Pasha, M., B. Kc, and S. Somasundaram. 2017. “An approach to detect the cyber-physical attack on water distribution system.” In World environmental and water resources congress, 703–711. Reston, VA: ASCE.

Crossref

Google Scholar

Quiñones Grueiro, M., M. J. Ares-Milián, M. Sánchez Rivero, A. J. Silva Neto, and O. Llanes-Santiago. 2021. “Robust leak localization in water distribution networks using computational intelligence.” Neurocomputing 438 (May): 195–208.

Crossref

Google Scholar

Quiñones Grueiro, M., O. Llanes-Santiago, A. Prieto Moreno, and C. Verde. 2019. “Decision support system for cyber attack diagnosis in smart water networks.” IFAC-PapersOnLine 51 (34): 329–334.

Crossref

Google Scholar

Ramotsoela, D., G. Hancke, and A. Abu-Mahfouz. 2019. “Attack detection in water distribution systems using machine learning.” Hum.-centric Comput. Inf. Sci. 9 (13): 1–22. https://doi.org/10.1186/s13673-019-0175-8.

Google Scholar

Rasekh, A., A. Hassanzadeh, S. Mulchandani, S. Modi, and M. K. Banks. 2016. “Smart water networks and cyber security.” J. Water Resour. Plann. Manage. 142 (7): 01816004. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000646.

Google Scholar

Roberts, S. 1959. “Control chart tests based on geometric moving averages.” Technometrics 42 (1): 239–250. https://doi.org/10.1080/00401706.1959.10489860.

Google Scholar

Saldarriga, J., J. Bohorquez, D. Cleita, L. Vega, D. Paez, D. Savic, G. Dandy, Y. Filion, W. Grayman, and Z. Kapelan. 2019. “Battle of the water networks district metered areas.” J. Water Resour. Plann. Manage. 145 (4): 04019002. https://doi.org/10.1061/%28ASCE%29WR.1943-5452.0001035.

Google Scholar

Salomons, E., O. Skulovich, and A. Ostfeld. 2017. “Battle of water networks DMAs: Multistage design approach.” J. Water Resour. Plann. Manage. 143 (10): 04017059. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000830.

Google Scholar

Shapira, N., O. Ayalon, A. Ostfeld, Y. Farber, and M. Housh. 2021. “Cybersecurity in water sector: Stakeholders perspective.” J. Water Resour. Plann. Manage. 147 (8): 05021008. https://doi.org/10.1061/(ASCE)WR.1943-5452.0001400.

Google Scholar

Shmueli, G., P. Bruce, I. Yahav, N. Patel, and K. J. Lichtendahl. 2017. Data mining for business analytics: Concepts, techniques, and applications. Hoboken, NJ: Wiley.

Google Scholar

Taormina, R., et al. 2018a. “The battle of the attack detection algorithms: Disclosing cyber attacks on water distribution networks.” J. Water Resour. Plann. Manage. 144 (8): 04018048. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000969.

Google Scholar

Taormina, R., and S. Galelli. 2018. “Deep-learning approach to the detection and localization of cyber-physical attacks on water distribution systems.” J. Water Resour. Plann. Manage. 144 (10): 04018065. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000983.

Google Scholar

Taormina, R., S. Galelli, H. C. Douglas, N. O. Tippenhauer, E. Salomons, and A. Ostfeld. 2018b. “A toolbox for assessing the impacts of cyber-physical attacks on water distribution systems.” Environ. Modell. Software 112 (Feb): 46–51. https://doi.org/10.1016/j.envsoft.2018.11.008.

Google Scholar

Taormina, R., S. Galelli, N. Tippenhauer, A. Ostfeld, and E. Salomons. 2016. “Assessing the effect of cyber-physical attacks on water distribution systems.” In World environmental and water resources congress 2016, 436–442. Reston, VA: ASCE.

Crossref

Google Scholar

Taormina, R., S. Galelli, N. Tippenhauer, E. Salomons, and A. Ostfeld. 2017. “Characterizing cyber-physical attacks on water distribution systems.” J. Water Resour. Plann. Manage. 143 (5): 04017009. https://doi.org/10.1061/%28ASCE%29WR.1943-5452.0000749.

Google Scholar

Tuptuk, N., P. Hazell, J. Watson, and S. Hailes. 2021. “A systematic review of the state of cyber-security in water systems.” Water 13 (1): 81.

Crossref

Google Scholar

Wheeler, D. J. 2000. Understanding variation: The key to managing chaos. Knoxville, TN: SPC Press.

Google Scholar

World Bank. 2016. “The world bank and the international water association to establish a partnership to reduce water losses.” Accessed February 14, 2023. https://www.worldbank.org/en/news/press-release/2016/09/01/the-world-bank-and-the-international-water-association-to-establish-a-partnership-to-reduce-water-losses.

Google Scholar

Information & Authors

Information

Published In

Journal of Water Resources Planning and Management

Volume 149 • Issue 5 • May 2023

Copyright

This work is made available under the terms of the Creative Commons Attribution 4.0 International license, https://creativecommons.org/licenses/by/4.0/.

History

Received: Mar 11, 2021

Accepted: Dec 4, 2022

Published online: Feb 25, 2023

Published in print: May 1, 2023

Discussion open until: Jul 25, 2023

Authors

Affiliations

Claudia Rodríguez-Martínez [email protected]

Professor, Study Center of Mathematics, Universidad Tecnológica de La Habana José Antonio Echeverría, CUJAE, Marianao, La Habana CP 19390, Cuba. Email: [email protected]

View all articles by this author

Marcos Quiñones-Grueiro [email protected]

Research Scientist, Institute for Software Integrated Systems, Vanderbilt Univ., Nashville, TN 37235. Email: [email protected]

View all articles by this author

Orestes Llanes-Santiago https://orcid.org/0000-0002-6864-9629 [email protected]

Professor, Dept. of Automation, Universidad Tecnológica de la Habana José Antonio Echeverría, CUJAE, Marianao, La Habana CP 19390, Cuba (corresponding author). ORCID: https://orcid.org/0000-0002-6864-9629. Email: [email protected]

View all articles by this author

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

James H. Stagge, David E. Rosenberg, Anthony M. Castronova, Avi Ostfeld, Amber Spackman Jones, Journal of Water Resources Planning and Management’s Reproducibility Review Program: Accomplishments, Lessons, and Next Steps, Journal of Water Resources Planning and Management, 10.1061/JWRMD5.WRENG-6559, 150, 8, (2024).
Abstract

Abstract

Practical Applications

Introduction

Materials and Methods

Autoencoders

Adaptive Exponential Weighted Moving Average Chart

Structural Analysis

Methodology for Detection and Location of Cyberattacks

Offline Stage

Online Stage

Application of the Proposed Methodology to the C-Town Case Study

Case Study: C-Town WDN and BATADAL Data Sets

Detection and Localization in the Offline Stage

Performance Assessment

Cyberattack Detection

Cyberattack Location

Application of the Proposed Methodology to the E-Town Case Study

Case Study: E-Town WDN and Attack Data Set

Detection and Location in the Offline Stage

Cyberattack Detection and Location

Conclusions

Appendix. Structural Analysis Variables

Data Availability Statement

Reproducible Results

Acknowledgments

References

Information

Published In

Copyright

History

Authors

Affiliations

Metrics

Citations

Download citation

Cited by

Figures

Other

Share

Copy the content Link

Share with email

Share

Request Username

Create a new account

Change Password

Password Changed Successfully

Verify Phone

Congrats!