Open access
Technical Papers
Nov 26, 2019

Predicting Environmental Impact of Hazardous Liquid Pipeline Accidents: Application of Intelligent Systems

Publication: Journal of Environmental Engineering
Volume 146, Issue 2

Abstract

In case of failure, hazardous liquid pipelines can have adverse environmental consequences. This study presents a method to predict the occurrence of certain environmental impacts resulting from hazardous liquid pipeline accidents. Explanatory variables, including pipe diameter, commodity transported, and incident area type, are used to train an adaptive neuro-fuzzy inference system (ANFIS). Three impact types are analyzed: water contamination, soil contamination, and impact on wildlife. Results show that the model can accurately predict whether a pipeline segment with given design characteristics could lead to adverse environmental impacts due to failure (14%, 6%, and 3% error for soil and water contamination and impact on wildlife, respectively). This model can be used in pipeline design and risk management planning to minimize the potential for environmental consequences. However, more comprehensive and robust reporting requirements beyond simple occurrence would improve our ability to prioritize these mitigative actions.

Introduction

In the United States (US), the hazardous liquid pipeline network has grown by 15% in miles and 30% by operator count since 2010 (PHMSA 2018a). The Pipeline and Hazardous Material Safety Administration (PHMSA) regulates approximately 3.47×106  km of hazardous liquid pipelines and a total of 529 hazardous liquid pipeline operators.
Pipelines have the lowest failure rates among fuel transportation methods, on the order of 103 and 104 accidents per km per year for onshore and offshore pipelines, respectively (Belvederesi 2017; Green and Jackson 2015; Singleton 2013). However, the receiving environments, communities, and oil companies can be significantly impacted by a single accident (Belvederesi et al. 2018; TransCanada 2017; PHMSA 2018b; EPA 2015). The objective of this study is to predict environmental consequences due to hazardous liquid pipeline accidents based on pipeline explanatory variables, such as commodity transported, accident location, nominal diameter, nominal wall thickness, and material-release method. These explanatory variables were selected following a thorough statistical analysis of the relationships between pipeline design and failure (Belvederesi et al. 2018), which informs on the data availability and significance. Moreover, this set of explanatory variables represents the basic pipeline design and the information most commonly gathered by regulatory agencies after pipeline accidents worldwide (Belvederesi 2017), which helps model reproducibility in other countries. An adaptive neuro-fuzzy inference system (ANFIS) was developed to predict water and soil contamination and adverse impact on wildlife based on the aforementioned explanatory variables. Conventional approaches to system modeling, which include probabilistic analysis, are often unsuitable for complex and uncertain systems, especially in environmental analysis (Suparta and Alhasa 2016; PreventionWeb 2018). By using a fuzzy inference system (FIS) and artificial neural networks (ANN) together, it is possible to control and overcome the complexity and uncertainty of these systems (Jang 1991, 1993). However, when using FIS, there are no mathematical methods to transform the experience and knowledge of human experts into the required if–then rule-based approach, and the lack of adaptability of the learning algorithm in tuning membership functions to minimize the error often limits the applicability of FIS. To address these limitations, ANFIS was employed in this study. ANFIS is classified as artificial intelligence (AI) because its capability for processing nonlinear and complex information is similar to the human brain. ANFIS uses fuzzy conditional statements (if–then rules) to capture the imprecise modes of reasoning behind the human ability to make decisions in an environment governed by uncertainty and imprecision. The set of if–then rules is based on data, which represent human knowledge derived from past experience. Membership functions must be developed from the available data on which the model is based (fuzzification). While membership functions used in FIS are generated by human expertise, membership functions for ANFIS are generated by adaptive neural networks (NN). In addition to ranges, ANFIS also generates probability distribution types and parameters.
The literature offers some examples where ANFIS or similar approaches were adopted as intelligent systems in different fields of application. Alizadeh et al. (2018) demonstrated how machine learning can assist environmental management and monitoring tools by providing accurate predictions for water quality parameters. Wijayasekara and Manic (2014) studied how to develop membership functions using classical statistical methods. They investigated data set coverage, complementarity, and relative dissymmetry, and used this information as metrics for understandability in the generated membership functions (Wijayasekara and Manic 2014). Liu and Zhang (2015) developed an ANFIS model-based data-driven approach to control the welding process. They concluded that, compared with linear models, to control welding processes—welding speed and weld pool characteristics parameters—an iterative local ANFIS model using k-means clustering provides better results for modeling performance (Liu and Zhang 2015). Sharkawy et al. (2014) compared three algorithms to predict the surface roughness of end-milling process: a radial basis function neural network (RBFN), ANFIS, and a genetically evolved fuzzy inference system (G-FIS). Among these techniques, they found that RBFN is the most successful at performing surface roughness assessment (Sharkawy et al. 2014). Dastorani et al. (2010) studied how to predict missing flow data of gauging stations using data from nearby hydrometric stations to train an ANN and an ANFIS model. In this study, the researchers concluded that ANFIS is superior for estimating missing data, although ANN produces results with a high level of accuracy (Dastorani et al. 2010). Badde et al. (2015) compared FIS and ANFIS to predict compressive strength of ready-mix concrete (RMC). ANFIS with Gaussian membership functions could predict the 28-day compression strength of ready-mix concrete with satisfactory performance, according to the authors (Badde et al. 2015). ANFIS was also applied in the oil and gas industry to assess and optimize pipeline system performance, predict corrosion rates of steel pipelines, and reduce energy consumption for gas transportation. For example, Baghmolaei et al. (2014) developed both FIS and ANFIS models to minimize fuel consumption of gas turbines, and compared these two techniques with ANN. The results of modeling by ANN, FIS, and ANFIS showed that ANN with the genetic algorithm code has the best performance (Baghmolaei et al. 2014). He et al. (2012) studied how RBF and ANFIS could predict the corrosion rate of underground steel pipelines. Results showed that ANFIS can more accurately predict corrosion rate considering changing corrosion factors (He et al. 2012). Baghban et al. (2017) proposed a study that aimed to develop an ANFIS-based model to predict the breakthrough curves for rhamnolipid adsorption over activated carbon. The models’ accuracy (absolute average deviation of 1.98%) shows that ANFIS would be a more reliable method to predict breakthrough curves compared with ANN and group method data handling (GMDH). Shahrak et al. (2018) applied ANFIS to model water vapor uptakes in different porous metal–organic framework materials, showing a high coefficient of determination (R2=0.90). A multilevel adaptive neuro-fuzzy inference system was developed by Nabavi-Pelesaraei et al. (2019) to predict various environmental, energy, and economic indices of large-scale food production systems, showing high accuracy with coefficients of determination between 0.91 and 0.98.
Predicting environmental consequences due to hazardous pipeline failures can be challenging. To develop a predictive model, it is necessary to have a robust, complete, and informative data set containing information gathered from past accidents. Belvederesi et al. (2017, 2018) investigated the limitations that derive from inadequate data and their implications for accident response actions, and conducted a thorough statistical analysis of trends in hazardous liquid pipeline accidents in relation to the pipeline design characteristics. Major findings led to the development of the model presented in this study.

Methods

Data Source

The Pipeline and Hazardous Material Safety Administration (PHMSA), US Department of Transportation (DOT), regulates approximately 3/4 of the country’s inter- and intrastate pipelines (Belvederesi et al. 2017) and gathers information about oil and gas pipeline accidents that occur in the US. Although PHMSA has collected and made available information about hazardous liquid pipelines since 1986, the data available are temporally inconsistent in terms of reporting criteria. The definition of an accident has changed over time and, for this reason, several pre-2010 accidents were not included in the data set because they did not meet the reporting criteria. The quality and quantity of information provided by PHMSA have increased over time. To ensure consistency and significance for model training and, therefore, predictions, this study focuses on information provided by PHMSA between January 1, 2010, and October 31, 2018, because it was collected under the same requirements and reporting criteria, enabling a more robust analysis of hazardous liquid pipeline failures in the US (Belvederesi et al. 2018). Moreover, this study considers both offshore and onshore gathering and transmission hazardous liquid pipelines regulated by PHMSA.
Details regarding the environmental consequences of pipeline accidents are collected by PHMSA in 21 descriptive database fields, including wildlife impact (i.e., fish, birds, and terrestrials), soil contamination and remediation, and water contamination (i.e., surface water, groundwater, drinking water, and public water). However, the database reports information in the form of a Yes/No statement, and details regarding the number and species of animals impacted, the volume of soil contaminated, and the concentration of chemicals and contaminants are missing. To quantitatively predict the environmental consequences of pipeline accidents, input data regarding the magnitude of the impact must be provided to the model. However, there are no data available to quantify the degree of severity of environmental impacts of pipeline accidents (e.g., area affected by the spill or number of animals involved); therefore, the information collected about whether the accidents’ impacts on wildlife, water, and soil were due to pipeline failures can be used only as descriptive input to the model.
Supporting details regarding PHMSA’s hazardous liquid scope, definitions, reporting criteria, and other general information can be found in title 49, subtitle B, chapter 1 (subchapter D), part 195 of the Code of Federal Regulations [C.F.R. (2019)].
The analysis presented in this paper considers five input variables used to predict the environmental outputs:
1.
Commodity transported
a.
Biofuels (biodiesel, and fuel-grade ethanol);
b.
Carbon dioxide;
c.
Crude oil;
d.
High-vapor liquids (HVL) and other flammable or toxic fluids that are a gas at ambient conditions, such as liquefied petroleum gas (LPG) and natural gas liquid (NGL), anhydrous ammonia, natural gasoline, refinery-grade propylene (RGP), etc.; and
e.
Refined and/or petroleum product (non-HVL) that is a liquid at ambient conditions (diesel, jet fuel, kerosene, crude condensate, etc.).
2.
Location
a.
Aboveground;
b.
Tank, including attached appurtenances;
c.
Transition area (soil/air interface); and
d.
Underground.
3.
Nominal diameter over nominal wall thickness (D/t)
4.
Maximum operating pressure (MOP)
5.
Release mode
a.
Leak (seal or packing, connection failure, crack, pinhole, etc.);
b.
Mechanical puncture;
c.
Other (details given in the database, such as “Pump malfunction due to oiler not operating correctly” or “Release occurred from a damaged pipe tee connection on the 2” sump discharge line which connects to the return header and the Transmix line”);
d.
Overfill or overflow; and
e.
Rupture.
In this study, the output is environmental impact, divided into three categories: (1) adverse effect on wildlife, (2) water contamination, and (3) soil contamination. The output type for the three models is presented in the same form provided by PHMSA (Yes/No statement on impact on wildlife, soil, and water) because no information regarding the magnitude of the consequences is provided. For calculation purposes, the outcome “Yes” is given as 1, and the outcome “No” is given as 2.
Fig. 1 presents an example of ANFIS reasoning used to train the model, while Fig. 2 shows the details for the input variables used in this study.
Fig. 1. Example of ANFIS reasoning used to train the model.

ANFIS

ANFIS can be described as a feedforward neural network with multilayer perceptron (Suparta and Alhasa 2016; Jang 1993). This means that it does not have a feedback link within its architecture, and data and incoming signals are allowed to move in one direction only. The learning algorithm plays the important role of modifying the parameters in the network to adapt to its environment. Two types of learning processes have been widely adopted in the literature: supervised and unsupervised. In supervised learning, a set of input variables is entered into the model as a sample pattern that has been marked or labeled. Each incoming signal to the single neuron spreads along the network until it reaches the end layer of neurons in the output layer. In the final layer, the output is generated and compared with the output pattern. Conversely, unsupervised learning does not have guidelines or target output in the learning process. The network simply receives many samples of inputs and then associates the sample set randomly to some classes or categories. In other words, the output will have some sort of similar characteristics to the input stimulus.
This study considers ANFIS using a grid partitioning and classification method (supervised learning algorithm) and adopting the Takagi-Sugeno type inference system. A hybrid algorithm that combines least-squares estimator and the gradient descent method is adopted. This means that, during the training process, a forward and backward propagation algorithm from Layer 1 to Layer 5 and vice versa (Fig. 3) serves to correct the parameters of the membership functions (in this study, membership functions are set as Gaussian and triangular distributions). At the same time, the gradient descent method is used to find the nonlinear function minimum, resulting from the weights generated by the fuzzy rules. Fig. 3 shows the structure of the fuzzy reasoning mechanism for this study, where it is possible to see where the forward–backward propagation applies from Layers 1 to 5 and back to Layer 1. Fig. 4 shows the application of the gradient descent method, where the minimum error represents the bottom of the curve.
Fig. 2. Schematic overview of the input–output set for ANFIS.
Fig. 3. ANFIS structure for the presented model. V1, V2, V3, V4, and, V5 = input variables; A, B, C, D, and E = membership functions for each input variable; Π = firing strength of the fuzzy logic rules; N = ratio between the ith rules firing strength and the sum of all firing strengths; and = summation of all the incoming signals from the previous node.
Fig. 4. Graphical representation of the gradient descent algorithm.
In Layer 1, for each input variable there is a set of membership functions that adapts to function parameters. The output from each node is a degree of membership value that is given by the input of the membership functions. V1, V2, and V5 are triangular membership functions [Eq. (2)] because of the categorical nature of the input variables (categorical variables can only take one of a limited and fixed set of values in a group) and because it leads to shorter computational times, and V3 and V4 are Gaussian membership functions [Eq. (1)] because data for these input variables are ordinal
μ(x)=exp[(xc2a)2]
(1)
f(x;a,b,c)={0,xaxaba,axbcxcb,bxc0,cx}
(2)
where μ = degree of membership functions for the given fuzzy set; x = one of the input variables; and a, b, and c = parameters of a membership function that can change the shape of the membership function.
In Layer 2, every node is fixed (nonadaptive), and the circle node is labeled as Π. The output node results from the multiplication of incoming signals and is delivered to the next node. It represents the firing strength for each rule. The T-norm operator with general performance (AND) is applied to obtain the output, because all the explanatory variables occur simultaneously
wj=fVij*fViji=1,,5andj=1,,n
(3)
where wj = output that represents the firing strength of each rule; and n = number of membership functions per variable.
In Layer 3, every node is fixed (nonadaptive) and the circle node is labeled as N. Each node is the calculation of the ratio between the jth rules firing strength and the sum of all firing strengths. It is also called the normalized firing strength
wj=wjjwjj=1,,n
(4)
In Layer 4, every node is an adaptive node to an output, with a node function defined as
wjfj=wj(pjx+qjy+rj)j=1,,n
(5)
where wj = normalized firing strength from the previous layer; and (pjx+qjy+rj) = a parameter in the node. The parameters in this layer are referred to as consequent parameters.
In Layer 5, the single node is a fixed (nonadaptive) node that computes the overall output as the summation of all the incoming signals from the previous node. This circle node is labeled as .
jwjfj=jwjfjjwj
(6)
The first and the fourth layer contain the parameter that can be modified over time. The first layer contains a nonlinear set of premises parameters, while the fourth layer includes linear consequent parameters. To update both parameter types, a learning algorithm is necessary so that they can adapt to the model’s environment. A hybrid algorithm is used in this study.
The hybrid learning algorithm can be divided into two parts: the forward propagation and the backward propagation. During the forward propagation, the premises parameters (a, b, and c) in the first layer must be steady. A recursive least square (RLS) estimator method is applied to repair the consequent parameters in the fourth layer. Because the consequent parameters are linear, the RLS estimator method can be applied to accelerate the convergence rate in the hybrid learning process. After the consequent parameters are obtained, the backward propagation allows for comparison between the generated output and the actual output through the adaptive network input of initial data. The error identified during the comparison between the generated and actual output is propagated back to the first layer. At the same time, premises parameters in the first layer are updated using gradient descent. One level of hybrid learning is called epoch. With the hybrid learning algorithm, which combines RLS estimation and the gradient descent methods, the convergence can be reached faster than using the backpropagation algorithm only, because the dimensional search space is reduced. Further details regarding the hybrid learning algorithm can be found in Suparta and Alhasa (2016).
To ensure its best performance, reduce calculation errors, and avoid model overfitting, two data sets are entered to train and validate ANFIS, and one additional data set is used to test the model. The training data set contains the sample of data used to fit the model. The model sees and learns from these data. The validating data set, which in this study differs from model testing, includes a sample of data used to provide an unbiased evaluation of a model fit on the training data set while tuning model parameters [a, b, and c in Eqs. (1) and (2)]. The evaluation becomes more biased when the validation data set is incorporated into the model configuration. In other words, the model does not use this data set to train itself, but uses this information to consider uncertainty in the input–output relationship and avoid overfitting. In this study, training and validating data sets have the same sample size to obtain a more thorough representation of the training data and improve the model’s capability to deal with input–output variability and uncertainty. Hence, each accident training set (i.e., pipe D/t, commodity, location, pressure, and release mode) corresponds an accident validating set to guarantee avoiding model overfitting, because not every pipeline with a given set of design characteristics always leads to a certain type of consequence in case of failure.
The testing data set, which differs from the validating data set, includes the sample of data used to provide an unbiased evaluation of a final model fit on the training data set. It contains data that span the various classes that the model would face when used in the real world to make predictions. Table 1 provides the general assumptions for this model and, in particular, the sample size for each ANFIS model and the model settings adopted for all three impact types analyzed (water and soil contamination and impact on wildlife). The ANFIS settings in Table 1 were selected because their combination led to the highest model accuracy and, therefore, the most reliable predictions over the testing data set.
Table 1. ANFIS model assumptions and settings for each of the analysis types: soil contamination, water contamination, and impact on wildlife
Model assumption/settingSoil contaminationWater contaminationImpact on wildlife
Number of samples for training357358356
Number of samples for validating357357356
Number of samples for testing100100100
Membership functions type and number
 Commodity typeTriangular, 4Triangular, 4Triangular, 4
 Accident locationTriangular, 4Triangular, 4Triangular, 4
 D/tGauss, 5Gauss, 5Gauss, 5
 MOPGauss, 5Gauss, 5Gauss, 5
 Release modeTriangular, 4Triangular, 4Triangular, 4
Number of rule nodes1,6001,6001,600
Number of epochs101010
Output typeLinearLinearLinear
The model error for each environmental impact type is calculated using the AIP and AVP as follows:
AIP={i=1n|1(EiCi)|}×100n
(7)
AVP=100AIP
(8)
where Ei = predicted value; Ci = actual value; and n = sample size of the testing data set or number of input–output relationships. The average validity/invalidity percent provides an accurate tool to estimate the model performance in case of categorical output (i.e., Yes/No fashion), as other model performance tests, such as the coefficient of determination or relative error, generally pertain to ordinal output variables.

Results

In this section, results are reported for each output type: soil contamination, water contamination, and impact on wildlife. Table 2 summarizes AIP and AVP for each model. Similarly, Table 3 offers a detailed review of ANFIS performance in predicting whether the environmental contamination occurred according to actual data. These results are discussed in the next section.
Table 2. Average invalidity and validity percent for the three environmental impact types
Model identificationAIP (%)AVP (%)
Soil contamination1486
Water contamination694
Impact on wildlife397
Table 3. Summary of the ANFIS predictions for the three environmental impact types
OutcomeSoil contamination (%)Water contamination (%)Impact on wildlife (%)
Actual “No”267688
Actual “Yes”742412
Predicted “No”187285
Predicted “Yes”822815
Wrongly predicted “Yes”1153
Wrongly predicted “No”310
Linguistic variables (biofuels, crude oil, HVL, refined petroleum products, carbon dioxide, aboveground, tank, transition area, underground, leak, mechanical puncture, other, rupture, and overfill or overflow) are converted to categorical variables for calculation purposes. The following legend of these variable names helps in interpreting the results reported in this section:
1.
Commodity type
a.
Biofuels
b.
Crude oil
c.
HVL
d.
Refined petroleum product
e.
Carbon dioxide
2.
Location
a.
Aboveground
b.
Tank
c.
Transition area
d.
Underground
3.
Release mode
a.
Leak
b.
Mechanical puncture
c.
Other
d.
Rupture
e.
Overfill or overflow
4.
Output
a.
Yes
b.
No
Tables 46 provide the testing data set used to verify the model’s performance for each analysis in the sections below. Each line contained in Tables 46 represents a hazardous liquid pipeline failure. Each line, or failure, provides commodity transported, location where the accident occurred, diameter-to-wall-thickness ratio, operating pressure, and mode with which the material was released. The predicted output is compared with the actual output and reported as a green tick in case of matching outcomes and a red X in case of mismatch (1 stands for Yes and 2 stands for No).
Table 4. Testing data set and actual versus predicted output for soil contamination
YearCommodity typeLocationD/tMOP (psig)Release modeActual outputPredicted outputPredicted matches with actual?
20104421.431,453111Yes
20103445.881,440222Yes
20103463.93863121No
20103423.941,440222Yes
20103445.881,440222Yes
20103442.471,424222Yes
20103426.51,440122Yes
20103455.291,198311Yes
20104429.45720211Yes
20102442.18800411Yes
20112459.36285111Yes
20112445.88400111Yes
20112455.17810111Yes
20112443275111Yes
20113459.861,000211Yes
20112127.17150111Yes
20114164285111Yes
20112416.88275111Yes
20114435.841,296211Yes
20114440.861,195211Yes
20124442.471,200411Yes
20122451.28725211Yes
20124435.241,200211Yes
201234511,218111Yes
20122496285122Yes
20123435.241,573122Yes
20124434.51,349111Yes
20122421.43285111Yes
20132480936121No
20132134.5500111Yes
20132445.371,480111Yes
201324641,058111Yes
20134439.381,865411Yes
20132430.251,390111Yes
20132434.5508111Yes
20133473.061,025221No
20132421.43500221No
20134430.07250211Yes
20134446.54273111Yes
20132429.45366211Yes
20144453.33690121No
20142425.64800211Yes
20142464275111Yes
201424480121No
20142448285111Yes
20144435.241,440111Yes
20142124.84275111Yes
20142428.88308121No
20144428.88720111Yes
20142450720111Yes
20142432275111Yes
20143424.841,198111Yes
20142480936411Yes
201424641,315122Yes
20152424.84397111Yes
20153435.241,440122Yes
20152432275111Yes
20152428.671,440112No
20154445.881,252111Yes
20152464275111Yes
20152425.51,200121No
20152457.891,016111Yes
201544431,076111Yes
20152438.461,360211Yes
201544113.88657112No
20153446.871,440111Yes
20152448400111Yes
201644128.11584111Yes
20164442.67175211Yes
20162424.84397111Yes
20162451.281,440121No
201624481,440211Yes
20163442.701,440121No
20163459.921,440122Yes
20162418.26250111Yes
20164431.911,102111Yes
20163435.231,440122Yes
20162180285111Yes
20162464285111Yes
20163442.551,361122Yes
20164426.781,300311Yes
20162464.101,212122Yes
20174416.88275122Yes
201724106.76701111Yes
201734321,090111Yes
20173467.991,480111Yes
20172436.531,464111Yes
20172436.531,000211Yes
201744511,468111Yes
20173423.941,440122Yes
20174444.871,162111Yes
20174431.91720211Yes
20172436.531,480211Yes
20172453.33575111Yes
20182453.19780111Yes
20184463.83880111Yes
201824120.99275112No
20183424.841,080421No
20182428.88640111Yes
20183442.551,440111Yes
Table 5. Testing data set and actual versus predicted output for water contamination
YearCommodity typeLocationD/tMOP (psig)Release modeActual outputPredicted outputPredicted matches with actual?
20102492.53809422Yes
20102253.33285122Yes
20104427.4275122Yes
20103445.881,440222Yes
20104456.940311Yes
20104462.811,050111Yes
20102451492122Yes
201044641,035111Yes
20102448200122Yes
20102451275122Yes
20102426.78852122Yes
20102442.18800422Yes
20112464.06262122Yes
20114442.471,200211Yes
201144511,186111Yes
20113431.911,130122Yes
20112442.551,000222Yes
20114135.841,147111Yes
20112412.99600122Yes
20112176.92125111Yes
20112416.88275122Yes
20112445.881,000222Yes
20112433.75500122Yes
20122423.66408222Yes
20123442.471,440122Yes
20122473.181,378422Yes
20123442.471,335122Yes
20124135.24462112No
20124436.36200111Yes
20122464794222Yes
20124451900122Yes
20134437.33250111Yes
20133473.061,025211Yes
20131451.621,953111Yes
20132464913122Yes
20133453.191,307422Yes
20133442.471,440122Yes
20132423.66720122Yes
20133123.66720311Yes
2013443275311Yes
20132464824122Yes
20132434720122Yes
20132464945222Yes
20132442.551,440111Yes
20142157.69275122Yes
20144424.84485122Yes
20142440285122Yes
20144424.840122Yes
201424431,172122Yes
20142428.88308122Yes
20142443790122Yes
20142480809122Yes
20142440.86285122Yes
20144435.241,440122Yes
20142423.66700122Yes
20152422.96568411Yes
201544431,076122Yes
201544401,142411Yes
20152448784122Yes
201544113.88657111Yes
20152427.3910122Yes
20152416.88275122Yes
20152127.39366122Yes
201544431,076122Yes
20152440427122Yes
20151439.202,220122Yes
20152480526121No
20154442.55615221No
20152464275122Yes
20163451.281,440122Yes
20164431.911,102121No
20161423.941,765122Yes
20162468.57275122Yes
20163451.281,198121No
20162442.67275122Yes
20162471.17794122Yes
20163457.181,300422Yes
201624960321No
20162464.10720122Yes
20161434.51,671122Yes
201624641,160122Yes
20163442.701,440122Yes
20162438.46361122Yes
201721481,440122Yes
20174440275322Yes
20172448285122Yes
20174448990122Yes
20172436.531,480211Yes
20172190.671,400122Yes
20172483.33799122Yes
20172424.84640111Yes
20172464610211Yes
20173112.291,450111Yes
20174416275122Yes
20172424275122Yes
20172424.841,012122Yes
201834641,440122Yes
20184436.531,150111Yes
20182464.260111Yes
20184432950122Yes
Table 6. Testing data set and actual versus predicted output for adverse impact on wildlife
YearCommodity typeLocationD/tMOP (psig)Release modeActual outputPredicted outputPredicted matches with actual?
20102445.881,440222Yes
20103455.291,345422Yes
20102453500222Yes
20103455.291,198311Yes
20104429.45720211Yes
20102451275122Yes
20103442.471,440122Yes
20104445.88960222Yes
20104426.78625122Yes
20102453.33275122Yes
20102470.511,050421No
20112438.53780122Yes
20113440672122Yes
20112176.92125121No
20113434.51,440322Yes
20112464990122Yes
20112423.66464122Yes
20113434.51,440122Yes
20112496300122Yes
20113426.51,050222Yes
20114460.6125122Yes
20114442.491,150211Yes
20122428.88560122Yes
20124435.24720122Yes
20124449.261,342422Yes
20124451900122Yes
201244321,440122Yes
201224122.09150122Yes
20122470.511,061411Yes
201234481,440122Yes
20122418.99275122Yes
20132445.88546122Yes
20134448720122Yes
20134445.881,632422Yes
201344128.11275122Yes
20132448285122Yes
20132451795122Yes
20132434.5508122Yes
20132448365122Yes
20132464920122Yes
20133455.291,198122Yes
201344511,298211Yes
20134449.261,176122Yes
20142123.64425311Yes
20144428.88720122Yes
20142453.33275311Yes
20144424.840122Yes
20143424.841,198122Yes
20142428.85714122Yes
20144452.95952122Yes
20144464275122Yes
20144421.43125122Yes
20143451.281,198122Yes
20143451.281,336122Yes
20154442.55326122Yes
20152128.37225122Yes
201544401,142422Yes
20152434875122Yes
2015449.17960122Yes
20152428.88500122Yes
20154445.880221No
20152448400122Yes
20152121.43285122Yes
20151439.202,220122Yes
20152432600122Yes
20154480275122Yes
20152432250122Yes
20152426.78751122Yes
20153429.45911122Yes
201644128.11584122Yes
20162477.721,337122Yes
20162120275122Yes
2016240.05784122Yes
20162464285122Yes
20162434.5304122Yes
20163430.251,220122Yes
20164459.111,156122Yes
20162164285122Yes
20162428.671,440122Yes
20162427.4275122Yes
20162434.5305311Yes
201644128.11541311Yes
20174421.43750122Yes
20172211.940111Yes
20172448492122Yes
20173436.531,440222Yes
201734561,440222Yes
20172462.5285122Yes
20172419.23275122Yes
201714361,628111Yes
201724106.76701122Yes
20172439.41750222Yes
20173431.911,250122Yes
20172431.91535122Yes
20172142.67275122Yes
20184438.461,191211Yes
20182428.88640122Yes
20182432408122Yes
20182472275122Yes
201824120.99275122Yes

Prediction of Soil Contamination

The model used for predicting soil contamination due to hazardous liquid pipeline accidents can forecast environmental impact with 86% accuracy against testing data. Table 4 reports the input–output set used to test the model along with predicted and actual values.

Prediction of Water Contamination

The model used for predicting water contamination in case of hazardous liquid pipeline accidents can forecast environmental impact with 94% accuracy against testing data. Table 5 reports the input–output set used to test the model along with predicted and actual values.

Prediction of Adverse Impact on Wildlife

The model used for predicting adverse effects on wildlife in case of hazardous liquid pipeline accidents can forecast environmental impact with 97% accuracy against testing data. Table 6 reports the input–output set used to test the model along with predicted and actual values.

Conclusion and Discussion

A predictive model was developed to evaluate whether an environmental impact is likely to occur in case of hazardous liquid pipeline failures. Explanatory variables, including commodity transported, accident location, diameter-to-wall-thickness ratio, maximum operating pressure, and material release mode were used to train and validate an ANFIS model. Three different outcomes were investigated: soil and water contamination and adverse effects on wildlife.
Results show that the model is capable of accurately predicting outcomes for the three impact types, as follows:
1.
Soil contamination is predicted with 86% accuracy (14 AIP). The model incorrectly predicted soil contamination for 11% of the total number of failures, and did not predict soil contamination 3% of the time when contamination did occur.
2.
Water contamination is predicted with 94% accuracy (6 AIP). The model predicted water contamination 5% of the time when no actual contamination was recorded, and did not predict water contamination 1% of the time when contamination did occur.
3.
Adverse impact on wildlife is predicted with 97% accuracy (3 AIP). The model predicted adverse effects on wildlife 3% of the time when no actual adverse effect was recorded. The model was capable of predicting with 100% accuracy those cases where no impact on wildlife occurred.
Data used to test the model show that approximately 65% of hazardous liquid pipeline accidents resulted in soil contamination, 32% resulted in water contamination, and 15% resulted in adverse impacts on wildlife. In addition, 57% of incorrectly predicted outcomes for soil contamination occurred on pipelines transporting crude oil and located underground. This combination of the two parameter values is also the most recurrent in the training database (52% of accidents). Similarly, the results of the model for soil contamination show that 79% of incorrectly predicted cases include the combination of being located underground and with leak as the release mode. These instances are the most commonly recurring combinations of this pair of parameters in the training data set (73% of the training data set). This means that the model has a poor capacity to predict rare outcomes because, when there is a set of explanatory variables that rarely leads to an outcome, the model cannot make an accurate prediction for that rare event. This limitation could be further investigated by focusing the analysis on those sets of explanatory variables that rarely lead to soil contamination. The results for the water contamination model do not show any particular trend or commonalities similar to the soil contamination and wildlife impact models. In fact, the sets of variables that lead the model to the incorrect outcome are always diverse. Finally, the low number of accidents in the training data set that had adverse effects on wildlife (4.8%) could explain why the model for predicting adverse impacts on wildlife was 100% accurate. This should be further investigated by using a different set of accident data to test the model, although PHMSA reports few cases of impact on wildlife during the past 10 years.
The selected set of explanatory variables used in this study represents the optimal combination to obtain the most accurate results. However, accident location and release mode are less informative variables than others because more than 80% of accidents involved pipelines that failed underground, and more than 50% of the pipelines failed by leaking. The low variance shown by these variables would lead to the assumption that they do not provide substantial information to the model, although, by including them in the analysis, the model performs more accurately.
Predicting adverse environmental impacts of hazardous pipeline accidents is challenging in the absence of an adequate, complete, and informative data set that gathers information from past accidents. Because the model presented in this study is data driven, a major limitation is presented in the training/validating data set; missing data and incorrect reports introduce conflicting scenarios into the model that, as a consequence, it is not able to accurately predict, especially in the case of rare events. Currently, comprehensive design information must be collected by regulators after pipeline accidents, including pipeline material; despite this, a large number of missing data points makes it challenging to include certain design variables as inputs for the model. Belvederesi et al. (2018) show that, over time, regulators are becoming stricter with regard to reporting and the quality and quantity of information collected after an accident. For this reason, future iterations of this model should include more explanatory variables as they become available, such as pipeline material, and the model should be updated with newly collected data to ensure that predictions will be accurate in addition to the model being computationally efficient.
During pipeline planning and design, this model could provide an alternative method to predict the hazards and, consequently, the level of risk that a pipeline would pose to the surrounding environment. In those instances where the failure of hazardous liquid pipelines would be likely to lead to adverse environmental outcomes, additional protective actions should be adopted for both planned and existing pipelines. Protective measures can include installing casings on water-crossing pipelines and improving or reinforcing leak detection systems. Additionally, in the case of planned pipelines, an alternative combination of design characteristics should be evaluated, especially in environmentally sensitive areas.

Data Availability Statement

All data, models, or code generated or used during the study are available in a PHMSA repository online in accordance with funder data retention policies. They are available at https://www.phmsa.dot.gov/data-and-statistics/pipeline/gas-distribution-gas-gathering-gas-transmission-hazardous-liquids.

Acknowledgments

The authors acknowledge the contribution of Dr. Petr E. Komers, president of MSES, Inc., for his ongoing emotional, professional, and financial support.

References

Alizadeh, M. J., M. R. Kavianpour, M. Danesh, J. Adolf, S. Shamshirband, and K. W. Chau. 2018. “Effect of river flow on the quality of estuarine and coastal waters using machine learning models.” Eng. Appl. Comput. Fluid Mech. 12 (1): 810–823. https://doi.org/10.1080/19942060.2018.1528480.
Badde, D. S., A. K. Gupta, and V. K. Patki. 2015. “Comparison of fuzzy logic and ANFIS for prediction of compressive strength of RMC.” IOSR J. Mech. Civ. Eng. 3: 7–15.
Baghban, A., J. Sasanipour, P. Haratipour, M. Alizad, and M. Vafaee Ayouri. 2017. “ANFIS modeling of rhamnolipid breakthrough curves on activated carbon.” Chem. Eng. Res. Des. 126 (Oct): 67–75. https://doi.org/10.1016/j.cherd.2017.08.007.
Baghmolaei, M. M., M. Mahmoudy, D. Jafari, R. M. Baghmolaei, and F. Tabkhi. 2014. “Assessing and optimization of pipeline system performance using intelligent systems.” J. Nat. Gas Sci. Eng. 18 (May): 64–76. https://doi.org/10.1016/j.jngse.2014.01.017.
Belvederesi, C. 2017. Statistical analysis of oil and gas pipeline accidents with a focus on the relationship between pipeline design and accident consequences. Calgary, AB: Univ. of Calgary.
Belvederesi, C., M. S. Thompson, and P. E. Komers. 2017. “Canada’s federal database is inadequate for the assessment of environmental consequences of oil and gas pipeline failures.” Environ. Rev. 25 (4): 415–422. https://doi.org/10.1139/er-2017-0003.
Belvederesi, C., M. S. Thompson, and P. E. Komers. 2018. “Statistical analysis of environmental consequences of hazardous liquid pipeline accidents.” Heliyon 4 (11): e00901. https://doi.org/10.1016/j.heliyon.2018.e00901.
Dastorani, M. T., A. Moghadamnia, J. Piri, and M. Rico-Ramirez. 2010. “Application of ANN and ANFIS models for reconstructing missing flow data.” Environ. Monit. Assess. 166 (1–4): 421–434. https://doi.org/10.1007/s10661-009-1012-8.
EPA. 2015. “ExxonMobil Mayflower clean water settlement.” Accessed November 22, 2018. https://www.epa.gov/enforcement/exxonmobil-mayflower-clean-water-settlement.
Green, K. P., and T. Jackson. 2015. “Safety in the transportation of oil and gas: Pipelines or rail?” Accessed November 22, 2018. https://www.fraserinstitute.org/research/safety-transportation-oil-and-gas-pipelines-or-rail.
He, S., Y. Zou, D. Quan, and H. Wang. 2012. “Application of RBF neural network and ANFIS on the prediction of corrosion rate of pipeline steel in soil.” In Vol. 124 of Recent advances in computer science and information engineering. Lecture notes in electrical engineering. Edited by Z. Qian, L. Cao, W. Su, T. Wang, and H. Yang. Berlin: Springer.
Jang, J. S. 1991. “Fuzzy modeling using generalized neural networks and Kalman filter algorithm.” In Vol. 2 of Proc., 9th National Conf. on Artificial Intelligence (AAAI’91), 762–767. Palo Alto, CA: AAAI Press.
Jang, J. S. 1993. “ANFIS: Adaptive-network-based fuzzy inference system.” IEEE Trans. Syst. Man Cybern. 23 (3): 665–685. https://doi.org/10.1109/21.256541.
Liu, Y., and Y. Zhang. 2015. “Iterative local ANFIS-based human welder intelligence modeling and control in pipe GTAW process: A data-driven approach.” IEEE/ASME Trans. Mechatron. 20 (3): 1079–1088. https://doi.org/10.1109/TMECH.2014.2363050.
Nabavi-Pelesaraei, A., S. Rafiee, S. S. Mohtasebi, H. Hosseinzadeh-Bandbafha, and K. Chau. 2019. “Comprehensive model of energy, environmental impacts and economic in rice milling factories by coupling adaptive neuro-fuzzy inference system and life cycle assessment.” J. Cleaner Prod. 217 (Apr): 742–756. https://doi.org/10.1016/j.jclepro.2019.01.228.
PHMSA (Pipeline and Hazardous Materials Safety Administration). 2018a. “Pipeline mileage and facilities.” Accessed November 22, 2018. https://www.phmsa.dot.gov/data-and-statistics/pipeline/pipeline-mileage-and-facilities.
PHMSA (Pipeline and Hazardous Materials Safety Administration). 2018b. “Updates on ExxonMobil pipeline incident in Mayflower, Arkansas.” Accessed November 22, 2018. https://www.phmsa.dot.gov/foia/updates-exxonmobil-pipeline-incident-mayflower-arkansas.
PreventionWeb. 2018. “Deterministic and probabilistic risk.” Accessed November 22, 2018. https://www.preventionweb.net/risk/deterministic-probabilistic-risk.
Shahrak, M. N., M. Esfandyari, and M. Karimi. 2018. “Efficient prediction of water vapor adsorption capacity in porous metal–organic framework materials: ANN and ANFIS modeling.” J. Iran. Chem. Soc. 16 (1): 11–20. https://doi.org/10.1007/s13738-018-1476-y.
Sharkawy, A., M. El-Sharief, and M. S. Soliman. 2014. “Surface roughness prediction in end milling process using intelligent systems.” Int. J. Mach. Learn. Cybern. 5 (1): 135–150. https://doi.org/10.1007/s13042-013-0155-7.
Singleton, M. 2013. “What’s the safest way to transport oil? US transportation and state departments won’t say.” Accessed November 22, 2018. http://www.ibtimes.com/whats-safest-way-transport-oil-us-transportation-state-departments-wont-say-1172847.
Suparta, W., and K. M. Alhasa. 2016. Modeling of tropospheric delays using ANFIS. New York: Springer.
TransCanada. 2017. “TransCanada responds to oil leak in Amherst, South Dakota.” Accessed November 22, 2018. https://www.transcanada.com/en/announcements/2017-11-16transcanada-responds-to-oil-leak-in-amherst-south-dakota/.
Wijayasekara, D., and M. Manic. 2014. “Data driven fuzzy membership function generation for increased understandability.” In Proc., 2014 IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE). Piscataway, NJ: IEEE.

Information & Authors

Information

Published In

Go to Journal of Environmental Engineering
Journal of Environmental Engineering
Volume 146Issue 2February 2020

History

Received: Mar 13, 2019
Accepted: Jun 4, 2019
Published online: Nov 26, 2019
Published in print: Feb 1, 2020
Discussion open until: Apr 26, 2020

Authors

Affiliations

Management and Solutions in Environmental Science Inc., 207 Edgebrook Close NW, Calgary, AB, Canada T3A 4W5 (corresponding author). ORCID: https://orcid.org/0000-0003-1866-9493. Email: [email protected]
Megan S. Thompson, Ph.D. [email protected]
Management and Solutions in Environmental Science Inc., 207 Edgebrook Close NW, Calgary, AB, Canada T3A 4W5. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share