Open access
Technical Papers
Nov 20, 2015

Quest for a New Solver for EPANET 2

Publication: Journal of Water Resources Planning and Management
Volume 142, Issue 3

Abstract

One of the best known hydraulic water distribution modeling toolkits that is most used by both researchers and practitioners is EPANET 2. Initially the authors aimed to speed up such simulations by utilizing modern multicore processors in the code implementation. The 30-year-old linear solver was replaced in steps by seven different modern multicore capable solvers. Subsequently, speedup tests were carried out with small- (up to 1.6×103 nodes), medium- (up to 6.3×104 nodes) and large-sized (up to 6.3×105 nodes) test cases. None of the tested solvers was found to perform faster than the original solver for networks with a real-world character, although two solvers showed a speedup on medium and large water distribution networks. This is an example that strategies to reduce computation time can produce promising results in a theoretic research environment, but fail in practical engineering applications. Likewise, this paper highlights again the importance of considering realistic test cases during the implementation phase. Further, the authors point out why the different tested strategies in this work have not succeeded. Although two other issues (implemented hash table and node reordering) could be identified for potential improvement of the code, it was concluded that the original solver is still the fastest for practical system configurations.

Introduction

EPANET 2 (Rossman 2010) is the most often applied software tool for simulating the hydraulics and water quality in water distribution networks. It was developed by the U.S. EPA and is, therefore, in possession of the public domain. Accordingly, there is a very vibrant community of scientists and practitioners using EPANET 2 for various purposes. The popularity of the software also resulted in various extensions and developments.
The key element of water distribution network modeling is the solution of a large set of nonlinear equations. In terms of single application runtime, EPANET 2 already has a quite efficient solver for this numerical problem. Large networks are typically solved in seconds to minutes at maximum. But because of the vast field that EPANET 2 is applied to, the simulation software is not just run for single instances. In research scenarios, like network optimizations (Basupi and Kapelan 2015), rehabilitation planning (Roshani and Filion 2015), long-term effect simulations (Sitzenfrei and Rauch 2015), and sensitivity and vulnerability analysis, the model is run multiple times, potentially causing significant waiting periods for users to obtain their results (Diao and Rauch 2013; Sitzenfrei et al. 2011, 2013a). Additionally, some of the recently introduced pressure-dependent solvers follow an iterative approach, thus, further increasing the computational costs (Tanyimboh and Templeman 2010; Ang and Jowitt 2006; Siew and Tanyimboh 2011, 2012; Giustolisi et al. 2008; Rossman 2007).
Because of the aforementioned reasons, it is still worthwhile to research a faster solver for EPANET 2 [or enhanced (e.g., Alvarruiz et al. 2015) or even other solving algorithms (e.g., Belova et al. 2014)]. The solver that is currently implemented in EPANET 2 is approximately 30 years old (George et al. 1994). Since then, computers have changed radically and advances in numerical analysis have theoretically found faster and better methods for solving the set of equations. However, it is unclear how these solvers perform in real world applications or when used to solve the specific set of equations created by the Global Gradient Algorithm implemented in EPANET 2.
This paper describes the attempts to speed up EPANET 2 by using modern solver architecture. Special attention was put to utilize the additional parallel computing resources found in modern desktop computers. The current solver of EPANET 2 is not optimized for such parallel processors. Therefore, a major part of the computing resources found in modern processors are not used.
EPANET 2 is based on a solution of a set of nonlinear equations using newton iteration. In each iteration step, a matrix factorization is performed. These matrix operations are known to perform very well in a parallel environment and several packages exist that support parallel linear algebra (Van de Geijn 1997; von zur Gathen 1993; Agullo et al. 2009). A parallel implementation of EPANET 2, for multicore computers, was thought to be a fast and straightforward task. To reduce the development and testing time and to increase the confidence in the implementation, a replace sequential with parallel solver procedure (with very minor code changes) was aimed for.
To prevent others from falling into the same traps of such a seemingly easy task, this paper describes not only the success but rather the quest, i.e., which solvers do not work for common applications of EPANET 2. Several other attempts already point toward a very hard to solve problem.
Alonso et al. (2000) used parallel computing in a distributed fashion for solving the matrix factorization and other unspecified parts of EPANET 2. They achieved a small speedup on a system with approximately twice the number of pipes than nodes. Although on all other systems, no speedup was achieved. Furthermore, it was not specified which part of the parallelization contributed to the speedup.
Crous et al. (2012) used a conjugate gradient method to solve the linear equations on a graphics processor unit (GPU). Although the conjugate gradient method performs very well on GPUs (Bolz et al. 2003), Crous et al. (2012) implemented a solver for very large, dense matrices that is not found in typical applications of EPANET 2. It rather describes a completely different problem with different algorithmic complexity. Furthermore, the authors investigations showed that equations resulting from water distribution problems are bad- or ill-conditioned and, therefore, unsolvable with an un-preconditioned iterative method. Recently, Zecchin et al. (2012) showed that iterative solvers preconditioned with an incomplete lower upper (ILU) factorization or algebraic multigrid method (AMG) are able to solve large artificial networks faster than the current solver, but they failed to show similar results for problems that are typical when applying EPANET 2.
Guidolin et al. (2010, 2011, 2013) found that updating of the coefficients of the linear system costs more than solving the system itself. Therefore, Guidolin et al. (2013) focused on improving this part of the code. In their study, they used the CWSNet library, which is compatible with EPANET 2, but using a different code base, and they achieved a moderate speedup in the performance. However, this approach involves huge code changes in many places and functions of EPANET 2 and not the improvement of the sparse matrix solver as aimed for in this paper.
Another method that has already been applied for solving the problem is domain decomposition (Diao et al. 2013). Whereas the method is particularly well suited for parallelization, information is needed from the underlying graph of sparse linear systems (Saad 2003). Employing such a solver typically involves significant code changes and conflicts with the idea of encouraging trust in the implementation by making only minimal alterations. In the work of Diao et al. (2013), the applied solver showed a speedup of eight on one tested system. Because the aim of this paper is to obtain a practical and easily usable solution for the main audience of EPANET 2 (practitioners and researchers), a solver on the basis of distributed computing parallelization [as used in Diao et al. (2013)] is, therefore, out of the scope of this work (Gaidamour et al. 2008).
To summarize, the starting point of this investigation is the enhancement of one of the most often applied software tools in urban water management, which is EPANET 2. Significant attempts have been undertaken and described in the literature to improve the 30-year-old solution procedures, but none of them showed an overwhelming success. In this paper, the authors give an overview on this issue by strict code and performance analysis, in particular with a view on latest computer architecture, and describe success but also pitfalls of such quest for a new solver for EPANET 2. With code profiling, the most promising function for runtime optimization is identified and the 30-year-old linear solver is replaced in steps by seven different modern multicore capable solvers. None of the tested solvers is found to perform faster than the original solver for networks with a real-world character, although two solvers show a speedup on medium and large water distribution networks. Further, it is found that the node reordering algorithm in EPANET 2 is highly inefficient for large systems and an improvement could improve the performance on medium to large systems dramatically.

Methods

The research behind this paper went through a lot of steps paved with failures and setbacks. In this paper, the authors chose to not only to describe the successful results but instead to tell the full story of the authors’ efforts. Thus, saving other researchers from wasting precious time by repetitive investigation of dead ends. Fig. 1 is the map that should guide the reader through this research. The two large boxes represent the “Methods” and “Results” sections of this paper, respectively. Each small box represents a “Profiling” subsection and the three performed tests. All three tests have a dedicated subsection in “Methods” (for the test description) and “Results” (for the results of the test). To understand the reasons behind the tests and their advance in time, it is sometimes necessary to refer to both “Methods” and “Results.” The following paragraph gives a brief overview of the temporal advance in the investigation and makes references to the sections with detailed information.
Fig. 1. How to read this paper
The first step in the research was to profile EPANET 2 (see “Methods: Profiling”). Profiling the application led to the conclusion that the solver used in EPANET 2 spends the greatest amount of runtime. The section “Methods: Solver Replacement” describes the first attempt to enhance the performance of EPANET 2 by replacing the old solver with a parallel solver. The first test revealed that a new parallel solver does not make EPANET 2 faster (see “Results: Solver Replacement”). It was then assumed that the tested systems are too small for the parallel solver. To prove this hypothesis, a second test (described in “Methods: Feasibility of a Parallel Solver”) was conducted to show at what sizes a parallel solver would be faster. The second test revealed that parallel solvers are only faster in system configurations that are unrealistic and unphysical (see “Results: Feasibility of a Parallel Solver”). There was the suspicion that the very carefully selected solver does not work very well for this type of problem. At the same time, results from the work of Zecchin et al. (2012) further corroborated these suspicions as they showed that a certain type of solver is capable of speeding up only certain types of networks (namely large ones). Therefore, a third test was conducted in which several solvers were benchmarked (see “Methods: Rigorous Solver Benchmark”). The same artificial networks as in Zecchin et al. (2012) were tested, but additionally also networks with real-world characteristics. The third test revealed that the simplest solver, one that is quite similar to the original solver, is the fastest for small artificial networks. A multicore parallel solver is the fastest for medium-sized networks; whereas for very large artificial systems, the solver suggested in Zecchin et al. (2012) is the fastest. Nevertheless, for all networks with real-world characteristics (including small networks), the simple solver is the fastest (see “Results: Rigorous Solver Benchmark”).

Profiling

The underlying aim of this work is to make EPANET 2 faster. In the process of application optimizations, the first step is always to find the corresponding algorithm (or part of) that is the most computational intense, i.e., which part of the code is the slowest in terms of runtime. A profiler is used to give a graphical representation of the functions that were called and how much time was spent during their execution. Fig. 2 shows part of a typical profile of an extended period simulation (large real-world network, 72 h simulation period, 1 h time step) of EPANET 2. Program profiles are used to make it clear where the most time of the application is spent. Each box represents a function in the program. The first line in the box is the name of the function followed by the cumulative time spent in both the function and in all subsequent function calls. The second line describes just the time spent in the function. An arrow from one box to the other represents a function call from a function (e.g., from netsolve, the functions linkstatus, linsolve, newcoeffs, and newflows are called). Lines with an “×” at the end, state how often a call was executed.
Fig. 2. Original EPANET 2 profile
The two most time-intensive functions for the specific tested network are linsolve and newcoeffs, which are both called from the netsolve function. The newcoeffs function updates the coefficients corresponding to the gradient method (Todini and Pilati 1988) and fills them into the matrix; linsolve solves the linear equation system previously filled by newcoeffs. Although newcoeffs uses 31% of the complete time, the total runtime is distributed throughout the whole range of functions and is, thus, more difficult to assess. In contrast, 29% of the overall runtime is spent in linsolve, which makes it a perfect target for optimization, especially because it is a general purpose linear algebra solver that has been implemented several times in commercial and open-source products. This initial profiling is performed with one single network as a starting point for this optimization procedure. Profiling different networks with variable characteristics could lead to divergent results, and, therefore, drawbacks as it may not represent typical execution time of the application. Nevertheless, using just one network is a common approach for profiling and, for a starting point in this work, it is assumed to be sufficient.

Solver Replacement

After profiling, the aim of this work focused on replacing linsolve with a call to a highly optimized solver, preferably running on a GPU. Parallel implementations of LAPACK and BLAS (both are standard code packages for numerical solutions) are common and seemingly achieves very good performance on GPUs and multicore CPUs (Van de Geijn 1997; Volkov and Demmel 2008; von zur Gathen 1993; Humphrey et al. 2010). The problem is that equations that are on the basis of network topologies (i.e., graphs) result in sparse linear systems (Davis 2006). Replacing linsolve with a dense linear solver has already been done (Crous et al. 2008, 2012). But dense solvers perform many more floating-point operations than sparse solvers, especially for such sparse systems that arise from water distribution modeling. Equations resulting from graph-based problems have multiple zero entries. Skipping these zeros has space and algorithmic advantages because neither does one need to store them in memory nor does one need to calculate numbers out of them.
In contrast to storing dense matrices, which are either in row-or column-major format, sparse matrices must record nonzero elements in a special format. Almost all packages agreed on the compressed sparse row (CSR) or column (CSC) format for storing sparse matrices (Davis 2006). EPANET 2 uses a slightly modified version of CSR and CSC, in which the diagonal is represented as a separate dense array. This can be done because the underlying matrices should always be positive-definite and positive-definite matrices always have diagonal elements [George et al. (1994), p. 160, for more information on the exact format used in EPANET 2].
To keep true to the “replace call to linsolve and do not touch anything else” strategy, the authors need to convert from one sparse format to the other, solve the equations, and write the results back. The equations from EPANET 2 result in a symmetric positive-definite matrix. Some solvers take advantage of the symmetry and store only the upper or lower part of the matrix, something that needs to be taken into account in the conversion process.
For this first test, PARDISO was chosen as a replacement for linsolve. PARDISO is a parallel direct solver using standard implementation routines [OpenMP and MPI (Dagum and Menon 1998; Snir et al. 1995)] for parallelization (Schenk et al. 2000). A multicore solver was initially favored because it is easier to debug on the CPU than on a dedicated GPU for which debugging methods are not as evolved. PARDISO was selected because it offers the best performance in a number of sparse problems (Gould et al. 2007) and is also used by Intel for its math kernel library (MKL) (Intel Corporation 2013). The PARDISO solver is also used in the third test, in which a number of sparse solvers were benchmarked (see “Methods: Rigorous Solver Benchmark”).
To experiment with other solvers and analyze the matrices in numerical computing environments, such as MATLAB, the authors implemented a function to dump the matrices from the EPANET 2 software to an output file just before the call to linsolve. To ensure that all tests are used under correct conditions and with valid matrices, all benchmarks were implemented to use the dumped matrices from a modified EPANET 2. Results can, therefore, directly be projected to an actual implementation in EPANET 2. It also allows for fast, easy, and repeatable testing of various solvers. The binary representations of the floating-point numbers, representing the matrix coefficients, were directly written into a file. This ensures that floating-point numbers, with exponent and mantissa, are bit-wise equivalent in both the original EPANET 2 software and in the benchmark application. The authors used the native format of the solver, which is always double type (8 bytes) as modern CPUs are optimized for doubles. Only the compute unified device architecture (CUDA) solver uses single floating point precision as GPUs have more single precision floating point units. The time spent for writing and reading those files was not included in the performance tests.

Feasibility of a Parallel Solver

After the initial implementation of PARDISO in EPANET 2 failed to provide any speedup, it was assumed that the matrices resulting from water distribution networks are too small and too sparse. The additional work, done by a parallel solver, to distribute and synchronize parallel tasks is called the parallel overhead. This parallel overhead is typically not bound to the size of the input and, therefore, the ratio between parallel overhead and actual workload decreases with increasing input size (e.g., larger water distribution networks with more neighbors, as shown in Fig. 3). If this overhead is too high, e.g., for very small and sparse systems, the solver is slower than a sequential solver. Such behavior is equally found in other systems and discussed in detail in Burger et al. (2014).
Fig. 3. Configurations with (a) four neighbors; (b) eight neighbors
To figure out at what problem sizes the PARDISO solver is faster than the sequential solver, a test with varying problem sizes was performed. For this test, random EPANET 2 input files were generated with an increasing degree of neighbors (to decrease the sparsity of the system) and increasing number of nodes (to increase the system size). Decreasing the sparsity and increasing the system certainly lowers the parallel overhead to workload ratio.
The graphs for the input systems were generated with a stochastic process. First, n nodes were randomly sampled on a unit square (x,y)([0,1),[0,1)). For each node, ni, the k (nearest neighbors) were connected. Ten percent of the nodes were defined as reservoirs to fulfill the boundary conditions. The nodal heads, demands, pipe lengths, and pipe roughness coefficients were also randomly sampled. For the test, 882 graphs (with n ranging from 400 to 20,000 in 200 steps, and k ranging from 2 to 10) were generated. For each graph, an EPANET 2 input file was generated; the runtime of both the modified EPANET 2 (with the original solver replaced by the PARDISO solver) and the original EPANET 2 were measured.

Rigorous Solver Benchmark

The third test was used to analyze the performance of various solvers on different ranges of networks. Two types of networks, artificially generated and real-world networks were used to determine the fastest solver for EPANET 2. The artificial networks were used to provide a wide range of networks with different sizes. They follow the scheme of networks generated in Zecchin et al. (2012) (see subsection “Artificial Networks”). They are easy to generate but do not necessarily reflect the properties of real-world networks. That is why a second set of real-world networks was included in the tests (see “Real-World Networks” subsection). To cover a wide range of different solvers, three iterative and four direct solvers were chosen: five parallel (CPU and GPU) and two sequential (see “Solvers” subsection).

Artificial Networks

The artificial input systems were generated following the work in Zecchin et al. (2012). The nodes of the system are placed on an orthogonal grid. The number of nodes in the system is driven by the side length of the square. The side lengths were chosen to be 1×10a, where a ranges from 2 to 6 in 0.2 steps. For the links, a configuration with four and eight neighbors was generated, as shown in Figs. 3 (a and b), respectively. The eight-neighbor configuration was included to decrease the sparsity of the resulting matrix, which should favor parallel solvers.
The resulting graph is then converted to an EPANET 2 input file format; the nodes represent junctions and the links represent pipes. Ten percent of the junctions are randomly chosen to be reservoirs. The values for the elements were also chosen randomly. For the junctions, the elevations was chosen between 10 and 20 m, and the nodal demand varied between 0 and 10L/s. For the pipes, the length was set between 1 and 100 m, the diameter between 100 and 200 mm, and the roughness between 1.0 and 4.0 mm (Darcy-Weisbach), the option “no minor losses” was set and all pipes were open. For the reservoirs, the elevation was defined between 100 and 140 m. The EPANET 2 simulations for the resulting systems were not unbalanced, had no warnings, and no negative pressures.

Real-World Networks

As described in the “Artificial Networks” section, the squared networks allow generating a wide range of artificial networks with different problem sizes. But these squared networks do not necessarily reflect system configurations in real-world networks. For example, the networks described in the previous section only included junctions, pipes, and reservoirs. Regulating elements such as valves, pumps, rules, or emitters are missing completely.
The real-world networks described in this paper are in many ways different than the generated squared networks described in previous sections. The first difference is that they fall into the category of small systems with node counts of up to 104. Real-world systems fall into these categories because they are typically made manually or generated from geographical information systems (GIS) data, which are in most cases also made manually. The number of nodes dictates the dimension (n) of the resulting matrix.
The second difference is that the artificially-generated systems have four or eight neighbors, i.e., each node is connected with four or eight pipes. Water network infrastructures with such high neighbor counts are unrealistic because of their high costs and redundancy. The number of neighbors influences the number of nonzero elements (nnz) in the matrix. For a four- and eight-neighbor configuration, the ration between nnz and n (fill ratio) is 3.0 and 5.0; whereas the ratio for real-world systems is 2.0.
The size and fill ratio are parameters that influence the runtime of all solvers: direct and iterative ones. One parameter that influences only the iterative solvers is the condition number of a linear system (Ax=b). The condition number [κ(A)] approximates how well a system can be solved by iterative methods. An identity matrix has condition one; it can, therefore, be solved very fast (one iteration) with errors only at the level of machine accuracy. The higher a condition number, the slower the CG (conjugate gradient) method converges, if ever (Saad 2003).
Table 1 summarizes the sizes and system elements of the 12 real-world networks used for benchmarking the solvers in this section. The networks 01-towng1, 04-towng2, 05-townh, and 10-town are actual networks from urban areas in Austria. The networks 02-bwsn1 and 12-bwsn2 are from “The Battle of the Water Sensor Networks” example1 and example2 (Ostfeld et al. 2008). The 03-dtown network is from “The Battle of the Water Networks II” (Marchi et al. 2014). The 06-richmond network is the water distribution network of Richmond in the U.K. (Van Zyl et al. 2004). The 07-wolf network is from the Wolf-Cordera Ranch model (Lippai 2005). The 08-exnet network is the EXNET network from the Exeter University (Farmani et al. 2004). The network 09-run1911 was generated using Sitzenfrei et al. (2013b) and Möderl et al. (2011). The networks 11-vibk and 13-vtirol are virtual networks, generated using Mair et al. (2014), that represent the actual networks for the town of Innsbruck in Austria and for the whole county of Tyrol in the west of Austria, respectively. As is shown, not all of these networks are real-world case studies but, as all of them represent real-world network characteristics, they are listed in this section.
Table 1. Real-World System Information
NameJunctionsPipesPumpsReservoirsTanksValvesEmitters
01-towng212715762410
02-bwsn112616821280
03-dtown399443111750
04-towng14536035611100
05-townh798881133160
06-richmond86594971610
07-wolf1,7821,98560440
08-exnet1,8912,46502020
09-run19115,1186,7698121700
10-town7,1287,59900430
11-vibk12,37713,02502000
12-bwsn212,52314,82242250
13-vtirol82,07984,28902000
Table 2 gives a summary of the solvers. Name is a shortened name used in this paper. Product is the name under which the solver is published. Method is a short description of how the solver solves the linear system. Parallel describes whether the solver runs parallel on a CPU, GPU, or is even parallelized. All solvers were written in native languages (C/Fortran) and used OpenMP on the CPU to parallelize the algorithms.
Table 2. Summary of the Solvers
NameProductAlgorithmMethodParallel
PardPARDISOLeft-and right-looking supernode techniqueDirectCPU
HslHSL-MA87DAG-basedDirectCPU
CholCHOLMODLeft-looking supernodalDirect
En2CXSParseUp-lookingDirect
SamgSAMGAMG-CGIterativeCPU
VclViennaCLILU0-CGIterativeCPU
CudaCUDAILU0-CGIterativeGPU

Solvers

The pard (from now on all solvers are identified by an acronym, as shown in Table 2) solver, also used in the first and second test, was selected as the primary CPU-based parallel direct solver. The solver uses a combination of left-right-looking supernode techniques (Schenk et al. 2000; Schenk 2013). To improve scalability, it exploits pipeline parallelism and a dynamic two-level scheduling (Schenk and Gärtner 2002).
The hsl solver is from the HSL (formerly Harwell Subroutine Library) software package. The package MA87 provides a direct solver for sparse symmetric positive definite systems using OpenMP. In contrast to all other solvers, this solver builds a direct acyclic graph (DAG) for handling dependencies between parallel tasks. Each task works on a block, L, which factorizes using a traditional dense Cholesky decomposition. The dependency on L is that all rows in L are already calculated before work on L starts (Hogg et al. 2010). Additionally, the package MC68 is used for approximate minimum-degree reordering (Science and Technology Facilities Council 2013).
The chol solver uses a dynamic supernodal left-looking sparse Cholesky method (Davis and Hager 2009). The chol solver was selected as a high performance sequential solver. The chol package also provides a GPU solver, based on cuBlas, but this feature was not used in this study (Rennich et al. 2014). Cholmod has an internal heuristic, based on the size and density of the matrix, which is used to choose the solving strategy (simplicial or supernodal). On the tested matrices, not even the supernodal factorization was triggered because the matrices are too small (Davies 2015).
The en2 solver is a very simple up-looking sparse Cholesky solver from the CXSparse package as described in Davis (2006). The original functionality of reordering and sparse storage allocation leading to a call to linsolve is not extractable from EPANET 2. The original Fortran methods (GSFCT and GSSLV) from the SPARSPAK package (Chu et al. 1984), described in the book George et al. (1994), are obsolete and have been removed from the netlib repository (Browne et al. 1995). Nevertheless, the factorization can be considered as the simplest one, and can, therefore, be seen as the most similar one to the code found in EPANET 2. Furthermore, the reordering code from EPANET 2 shows a very high asymptotic complexity, which makes the routine unusable for large system sizes [see “Other Observations” and Zecchin et al. (2012)]. So, the authors omitted further benchmark tests of original EPANET code. This nonlinear growth on runtime could not be observed with the code from Davis (2006).
As previously stated, iterative methods cannot solve the very ill-conditioned problems arising from water distribution simulations. Therefore, all iterative methods described in this paper use a preconditioner to enhance the condition of the system; therefore, these solvers are classified as preconditioned conjugate gradient (PCG) methods. The first iterative solver is samg, which is identical as described in Zecchin et al. (2012). It uses an AMG with the CG method as accelerator. This solver also uses OpenMP for parallelization, the special first touch policy suggested in the manual was implemented.
The second and third solvers are standard PCG with an incomplete LU zero (ILU0) factorization as preconditioner (Saad 2003). The vcl solver (from the ViennaCL project) was used as an example for a parallel ILU0-PCG on the CPU Rupp et al. (2010). In this research, only the CPU versions have been used as (1) the solver failed to find a solution, and (2) the GPU version of vcl is similar to the CUDA cuBLAS and cuSPARSE libraries. The last solver is an ILU0-PCG on the GPU, using the cuSPARSE and cuBLAS libraries. The implementation follows Naumov (2011), the details behind the GPU version of an ILU0 preconditioner can be found in Li and Saad (2013).
None of the authors of this paper is associated with the development or research in any of the tested solvers.

Testing Procedure

For both types of networks, the same procedure was carried out to evaluate the runtime of the solvers. The first step was to dump a matrix representation of the given input file, using the previously described modified EPANET 2. Then, the dumped matrix is used to run the benchmark application with all seven solvers. To emulate the outer newton iteration of EPANET 2, the matrix is solved eight times, which is usually enough to reach a satisfactory result in the iterative procedure. Everything that can be done outside the newton iteration (e.g., reordering, symbolic factorization) was only included once in the total runtime of the solver. To compensate for fluctuations in computing time, small systems were solved multiple times and the results were averaged over the number of runs.
All tests were performed on a dual-socket system with an Intel© Xenon© CPU X5650 at 2.67 GHz. Each CPU has six native cores, which results in 12 native-and 24 hyper-threaded cores. The tests were run on 12 threads. The system has 24 GB of RAM and two NVIDIA Tesla C2070 each featuring 6 GB of GDDR5 RAM and 448 stream processors. The GPU solver was executed on only one GPU. The system was operated by Linux with a KERNEL at version 3.10.21.

Results

Solver Replacement

If the PARDISO solver, used in the first test to replace the linsolve code, would have caused extreme performance advantages, the paper would have ended here: stating the obligatory speedup numbers. This was not the case. With the conversion in place, the first implementation did not gain any speed advantage over the standard EPANET 2 solver.
Converting from one sparse format to the other is an overhead that prevents the new implementation from being faster. To overcome the conversion overhead, EPANET 2 was modified to use the standard CSR format, as this is being used by most popular solvers including PARDISO. This move diverged from the initial idea to just modify the call to the linsolve function but it was more or less a find and replace procedure that in itself makes no difference to the performance of EPANET 2. But again, benchmarks of the CSR format in place did not reveal any speed advantages over the standard EPANET 2.
This indicates that the conversion overhead between different sparse formats is not the limiting factor but the problem is somewhat more complex. It was believed that linear systems from typical sized water distribution networks are too small and sparse for the PARDISO solver. Therefore, a second test was conducted.

Feasibility of a Parallel Solver

This test shows at what number of nodes and number of connections per node the PARDISO solver is faster than the sequential one. In other words: at what degree and sparsity of a matrix a parallel solver is feasible. In Fig. 4, k = number of connections per nodes and, therefore, decreases the sparsity of the resulting matrix; and n = number of nodes in the system and increases the degree of the matrix. At n=2,500 and k=6, the parallel solver is faster than the sequential solver. At approximately n=7,000, the intersection curve stagnates near k=4. This indicates that only with a connectivity degree of four, the parallel solver is faster than the sequential one. Networks with a connectivity degree of at least four (that is equal to 4 pipes connected at each node) can hardly be seen in real-world systems. Even on a two-dimensional lattice, the boundaries have only three neighbors and the corners, even worse, only two. Therefore, the authors must conclude that there is nothing to gain from using PARDISO, one of the fastest parallel direct sparse solvers, in EPANET 2 for real systems.
Fig. 4. (a) Intersection plot of parallel versus sequential solver showing when the parallel solver is faster than the sequential solver; (b) two plots stacked above: light gray is the parallel solver and dark gray is the sequential solver

Rigorous Solver Benchmark

This test shows a head-to-head performance evaluation of seven sparse solvers. The authors have chosen three iterative solvers and four direct solvers. A detailed description of each solver can be found in “Methods: Rigorous Solver Benchmark” followed by a summary of the solvers in Table 2.

Artificial Networks

Fig. 5 shows the results for the artificial network in a four-neighbor configuration. Up until 3.9×103, the en2 solver is the fastest. After that, pard takes the lead up until systems sizes of 1.6×105. Then, samg is faster than all others. The authors can conclude that for this test en2 is the fastest for small systems, pard should be used for medium-sized systems, and samg for large systems.
Fig. 5. Runtimes from the artificial network with a four-neighbor configuration and 12 threads (4NB-12TH)
For small systems, pard has a very fluctuating runtime. This behavior can only be seen in high thread counts and is vanishing when the thread count approaches one. Starting too many threads in pard for small systems is, therefore, counterproductive for the runtime. The hsl reveals similar performance than pard, but uses a direct acyclic graph for selecting parallel tasks. This task-based selection seems to favor the parallelization of small systems.
For the smallest system, samg is the slowest solver but the fastest for the largest ones. It seems that the setup costs of an AMG solver are very high, but there is a benefit especially for large systems. For en2, the opposite is true as it is the fastest solver for small systems but the slowest one for large systems.
The cuda solver has a very flat and stable runtime curve and seems to be the one with the lightest gradient. This suggests that the cuda solver can be the fastest solver for systems that are larger than the ones included in this research. In contrast, the overhead of ILU0 calculation and the memory transfer between the GPU and the main memory (which is not included in the performance tests) makes this solver a pointless choice for all problems of practical relevance. The second IlU0-PCG solver (vcl) seems to perform just in between the other solvers, never fast and never slow.
Fig. 6 shows the runtime with an eight-neighbor configuration. This configuration decreases the sparsity of the matrices. The decreased sparsity and, therefore, decreased ratio between parallel overhead and factorization work was thought to favor the parallel solvers.
Fig. 6. Runtimes from the artificial network with an eight-neighbor configuration and 12 threads (8NB-12TH)
The results are the same as previously: en2 is the fastest for small systems, pard for medium range systems, and samg for large systems. However, the limits between small, medium, and large systems have shifted: en2 is the fastest for systems below 1.6×103, pard for systems up to 6.3×104, and samg for the larger systems. Thus, the eight-neighbor configuration does indeed favor the parallel solvers.
Fig. 7 shows the speedup of the parallel solvers. The purpose of this figure is to show how much can be gained by running a solver in a parallel context compared with the same solver in a sequential context. In the sequential context, the parallel solvers were forced to use only one CPU core. Speedup, in this context, is defined as the value of the runtime in the sequential context (the solver operates with only one thread) divided by the runtime of the solver in the parallel context (the solver uses 12 threads). The horizontal black line marks a speedup factor equal to one, i.e., the solver in a parallel context is as fast as in the sequential context. Values below one indicate that the solver in the parallel context is actually slower than in the sequential context and vice versa.
Fig. 7. Parallel speedup (one versus 12 threads) for the artificial networks with an eight-neighbor configuration
Except for some rare conditions, parallelization always introduces an overhead. This overhead is the result of managing and synchronizing the work that needs to be done in parallel. With an increasing problem size, the parallel overhead typically decreases because more work can be done in parallel tasks before they must be managed and synchronized. Fig. 7 shows exactly at what problem sizes it is feasible to use the parallel versions of the solvers.
All solvers, except hsl, start to become faster at approximate system sizes of 103; hsl overcomes the parallel overhead at systems sizes larger than 1.6×102. Although pard has the worst speedup with small systems, it has the best speedup (of 6.81) with the largest systems. However, considering that a 12-core system was used, a speedup of 6.81 reflects an efficiency of only 0.57.

Real-World Networks

In this research, the real-world networks tested have very high condition numbers as compared with the generated squared networks. The squared networks, depending on their size, have condition numbers between 1.5×103 and 2.8×104. The condition numbers of real-world networks, which can be found in Table 3, range between 8×106 and .
Table 3. Condition Number Estimations of the Real-World Networks
NameCondition
01-towng21.36×1011
02-bwsn11.23×1010
03-dtown1.24×1011
04-towng1
05-townh
06-richmond
07-wolf8.33×1010
08-exnet2.88×1011
09-run19117.96×106
10-town9.14×1021
11-vibk8.81×109
12-bwsn2
13-vtirol9.83×109
All condition numbers were estimated using the condest command in MATLAB. High condition numbers can cause an increasing number of iterations or even stagnation (i.e., failure) of the iteration process. Accordingly the ILU0-PCG iterative methods had a lot of problems solving systems with high condition numbers. Table 4 summarizes the systems for which the ILU0-PCG solvers failed to find a result. Combining Tables 3 and 4, it becomes clear that a water distribution network with size n800 and condition numbers κ(A)109, cannot be solved using the ILU0-PCG method. The cuda solver failed even earlier to find solutions because it uses single precision floating-point numbers, a limitation of current GPU computing. Although GPUs that perform double precision floating-point operations are available, these devices are still rare as double precision is not used in typical gaming application. The third iterative solver (samg) was able to solve all systems although a warning of the solver was issued.
Table 4. Real-World Networks for Which the Vcl and Cuda Solver Failed to Find Results
NameVclCuda
03-dtownx
04-towng1x
05-townhx
06-richmondxx
10-townxx
11-vibkxx
12-bwsn2xx
13-vtirolxx

Note: x is a marker indicating which solver/network combinations failed to be solved.

Fig. 8 shows the results of the real-world networks computed with 12 threads. Under these conditions, en2 and chol are the fastest solvers. Compared with the direct solvers, the iterative solvers show slow performance on these small systems. The pard solver shows, again, very fluctuating results for small systems but stabilizes at system sizes of approximately 103. Although samg is the only solver capable of solving all real-world systems, the overhead of the AMG method is too large for such small systems. For practical applications (real-world systems with high condition numbers and small node and neighbor counts), the en2 solver is the fastest for all investigated system sizes.
Fig. 8. Real-world networks with 12 threads
Fig. 9 shows the speedup of the parallel solver for the real-world networks. With these small systems, hsl has the highest speedup at approximately 2.8. All other solvers have a speedup below 2.0, which is the theoretical best speedup for the simplest parallel system configuration: a system with two cores. The authors can, therefore, conclude that there is not much to gain from a parallel sparse solver for real-world networks.
Fig. 9. Parallel speedup (one versus 12 threads) of real-world networks

Other Observations

While benchmarking the different solvers, some defects of EPANET 2 have been found. The first and most severe one is that the runtime of the classical EPANET 2 grows in the order of approximately O(n3) for the tested systems. This was estimated by timing several EPANET 2 runs and fitting the resulting curve on the formula t=a+bnc, where t = observed runtime; n = number of nodes; and a, b, and c = free variables. The t resulted in a=4.581, b=2.088, and c=2.23. EPANET 2 was run with the generated squared input systems (of the “Results: Rigorous Solver Benchmark” section) with side lengths from 25 to 158, which results in a node count of 625 to 24,964 nodes. Fig. 10 displays the observed runtimes and the approximated curve from the fit. This graph shows how severe the runtime degenerates for systems larger than 104. It also shows that for smaller systems, i.e., in the range of real-world systems, the cubic complexity of EPANET 2 does not influence the overall runtime significantly.
Fig. 10. Approximated algorithmic complexity of the original EPANET 2 code
Fig. 11 shows the profile of a single execution run with a squared input size of 158. With a system of this size, reordernodes takes around 93% of the whole EPANET 2 runtime and linsolve was called eleven times in the execution but takes only 6.5% of the total runtime. This example proves that the node reordering algorithm in EPANET 2 is highly inefficient for large systems. Even though EPANET 2 implements the minimum degree ordering after George et al. (1994), which has a complexity of O(n2m), with m as the number of edges per node, the observed runtime suggests that this implemented version of the algorithm is highly unsuitable for large networks.
Fig. 11. Profile for the original EPANET 2 code for the artificial square input side length of 158
The second observation while working with large input systems was that, at a certain system size, EPANET 2 takes a very long time to parse the input file. The cause of this suboptimal performance was found to be a weak hashtable implementation. In the first phase of input parsing, EPANET 2 uses the names of the links and nodes to connect them to a graph. For a fast lookup of names to their corresponding link or node data, a hashtable was implemented. The complexity of hashtables is the complexity of the hashing method, typically O(1), plus the complexity of collision resolutions, which is, in this case, a chaining with a linked list and, therefore, O(n). In case of no collisions, hashtables have an optimal complexity of O(1). When hashtables get full, collisions are evident and the worst case complexity of O(n) is reached (Leiserson et al. 2001). In this case, the hashtable must be resized and rehashed so that fewer or no collisions happen. This is not the case for the hashtable in EPANET 2, which stays at a fixed size of 1,999 buckets. When EPANET 2 was first implemented, a bucket count of 1,999 was a good trade-off between memory size and algorithmic complexity. With nodes approaching more than 104, a bucket count of 1,999 results in very slow lookup times during the parsing of the input files.

Discussion and Conclusion

The aim of this paper is to describe the findings of a long journey searching for a faster solver for EPANET 2. The problem seemed to be easy as a substantial improvement was expected by replacing a 30-year-old code with the current state of mathematical achievements, algorithm design, and also incorporating the latest advances in hardware architecture. The bad news is that for typical real-world network sizes, the authors could not identify a solver that will produce results significantly faster than the current one. The good news is nobody has yet wasted any time by still using the old solver (apart from researchers trying to speed up the software).
In this work, rigorous testing of different modern solvers with numerous test cases of different sizes were performed. Therefore, real-world case studies and artificially generated systems were used. This comprehensive test environment enabled the authors to show that the performance of the solvers is significantly influenced by the system size and the system characteristics. Thus, reported benefits from particular solutions proved to be strongly related to the test environment and not achievable in the tested real-world applications.
For very large generated systems, the same results were found as reported by Zecchin et al. (2012). The proposed solver (samg) is 13 times faster than the solver en2 for the largest tested system. However, the same solver is 508 times slower for the smallest networks, which does not make it a perfect choice. And again, for medium-sized generated systems, a different solver (pard) was found to perform best.
The picture is even worse for real-world systems, which are sparser than the artificial systems. For the largest real-world system, the original solver (en2) is 10 times faster than the fastest tested solver (samg) and 533 times faster for the smallest system. Likewise, the solver fastest for medium-sized generated networks (pard) is constantly slower for real-world networks than the original solver. Therefore, for practical applications (real-world systems with high condition numbers, small node counts, and small neighbor counts), it can be concluded that the en2 solver is the fastest for all investigated system sizes. None of the other tested solvers can be recommended to be implemented in EPANET 2 as they perform suboptimal for the tested real world problem sets.
Reason is that matrices originating from real-world water distribution models that are based on the gradient method [e.g., the global gradient algorithm (GGA) in EPANET 2 (Todini and Pilati 1988)] are too small and too sparse to speed up any of the tested direct solvers against the original solver. The conditions of the same matrices are too bad to be solved by standard iterative solvers. Only one out of three iterative solvers was able to solve all tested real-world systems.
This raises the questions whether any solver, whether direct or iterative, will ever be able to be faster than the original solver in EPANET 2. These results correspond with findings by Wu and Lee (2011). But if further progress towards a faster solution of the set of nonlinear equations is dubious, research should rather focus on other strategies to speed up such simulations. What can be recommended is to fix the two defects of the current code found in the course of this research. A better hashtable and node reordering implementation does not harm the performance on any system but will improve the performance on medium to large systems dramatically.

Supplemental Data

Tables S1–S3 are available online in the ASCE Library (www.ascelibrary.org).

Supplemental Materials

File (supplemental_data_wr.1943-5452.0000596_burger.pdf)

Acknowledgments

This work is part of the DynaVIBe project, funded by the Austrian Science Fund (FWF); P23250-N24.

References

Agullo, E., et al. (2009). “Numerical linear algebra on emerging architectures: The plasma and magma projects.” Journal of Physics: Conf. Series, 180, IOP Publishing, Bristol, England, 012037.
Alonso, J. M., et al. (2000). “Parallel computing in water network analysis and leakage minimization.” J. Water Resour. Plann. Manage., 251–260.
Alvarruiz, F., Martínez-Alzamora, F., and Vidal, A. (2015). “Improving the efficiency of the loop method for the simulation of water distribution systems.” J. Water Resour. Plann. Manage., 04015019.
Ang, W. K., and Jowitt, P. W. (2006). “Solution for water distribution systems under pressure-deficient conditions.” J. Water Resour. Plann. Manage., 175–182.
Basupi, I., and Kapelan, Z. (2015). “Flexible water distribution system design under future demand uncertainty.” J. Water Resour. Plann. Manage., 04014067.
Belova, O., Skibin, A., and Volkov, V. (2014). “Control-volume method for extra large network hydraulic analysis.” Procedia Eng., 70(0), 123–131.
Bolz, J., Farmer, I., Grinspun, E., and Schröoder, P. (2003). “Sparse matrix solvers on the GPU: Conjugate gradients and multigrid.” ACM Trans. Graphics, 22(3), 917–924.
Browne, S., Dongarra, J., Grosse, E., and Rowan, T. (1995). The netlib mathematical software repository, D-Lib Magazine, Reston, VA.
Burger, G., Sitzenfrei, R., Kleidorfer, M., and Rauch, W. (2014). “Parallel flow routing in SWMM 5.” Environ. Modell. Software, 53, 27–34.
Chu, E., George, A., Liu, J., and Ng, E. (1984). “SPARSPAK: Waterloo sparse matrix package.” User’s guide for SPARSPAK-A: A collection of modules for solving sparse systems of linear equations, ON, Canada.
Crous, P., van Zyl, J., and Roodt, Y. (2012). “The potential of graphical processing units to solve hydraulic network equations.” J. Hydroinf., 14(3), 603–612.
Crous, P., Van Zyl, J., and Nel, A. (2008). “Using stream processing to improve the speed of hydraulic network solvers.” Proc., 10th Water Distribution Systems Analysis Conf., ASCE, Reston, VA, 1–8.
Dagum, L., and Menon, R. (1998). “OpenMP: An industry standard API for shared-memory programming.” Comput. Sci. Eng., 5(1), 46–55.
Davies, C. (2015) “User Guide for CHOLMOD: A sparse Cholesky factorization and modification package.” NVIDIA, Santa Clara, CA, 〈https://developer.nvidia.com/cholmod〉 (Mar. 27, 2015).
Davis, T. A. (2006). Direct methods for sparse linear systems, Society for Industrial and Applied Mathematics (SIAM), Philadelphia.
Davis, T. A., and Hager, W. W. (2009). “Dynamic supernodes in sparse Cholesky update/downdate and triangular solves.” ACM Trans. Math. Software (TOMS), 35(4), 1–23.
Diao, K., and Rauch, W. (2013). “Controllability analysis as a pre-selection method for sensor placement in water distribution systems.” Water Res., 47(16), 6097–6108.
Diao, K., Wang, Z., Burger, G., Chen, C.-H., Rauch, W., and Zhou, Y. (2013). “Speedup of water distribution simulation by domain decomposition.” Environ. Modell. Software, 52, 253–263.
Farmani, R., Savic, D., and Walters, G. (2004). “EXNET benchmark problem for multi-objective optimization of large water systems.” Modelling and control for participatory planning and managing water systems, International Federation of automatic control (IFAC), Venice, Italy.
Gaidamour, J., Hénon, P., et al. (2008). “HIPS: A parallel hybrid direct/iterative solver based on a Schur complement approach.” Proc., PMAA, Neuchâtel, Switzerland.
George, A., Liu, J., and Ng, E. (1994). Computer solutions of sparse linear systems, Academic Press, Orlando, FL.
Giustolisi, O., Savic, D., and Kapelan, Z. (2008). “Pressure-driven demand and leakage simulation for water distribution networks.” J. Hydraul. Eng., 626–635.
Gould, N. I., Scott, J. A., and Hu, Y. (2007). “A numerical evaluation of sparse direct solvers for the solution of large sparse symmetric linear systems of equations.” ACM Trans. Math. Software (TOMS), 33(2), 10.
Guidolin, M., Burovskiy, P., Kapelan, Z., and Savić, D. (2010). “CWSNet: An object-oriented toolkit for water distribution system simulations.” Proc., 12th Water Distribution System Analysis Symp., ASCE, Reston, VA.
Guidolin, M., Kapelan, Z., and Savić, D. (2013). “Using high performance techniques to accelerate demand-driven hydraulic solvers.” J. Hydroinf., 15(1), 38.
Guidolin, M., Savic, D. A., and Kapelan, Z. (2011). “Computational performance analysis and improvement of the demand-driven hydraulic solver for the CWSNet library.” Computing and Control for the Water Industry 2011, Vol. 1, Univ. of Exeter, Devon, U.K., 45–50.
Hogg, J. D., Reid, J. K., and Scott, J. A. (2010). “Design of a multicore sparse Cholesky factorization using DAGS.” SIAM J. Sci. Comput., 32(6), 3627–3649.
Humphrey, J. R., Price, D. K., Spagnoli, K. E., Paolini, A. L., and Kelmelis, E. J. (2010). “Cula: Hybrid GPU accelerated linear algebra routines.” SPIE defense, security, and sensing, International Society for Optics and Photonics, Bellingham, WA.
Intel Corporation. (2013). “Intel Math Kernel library reference manual.” 〈http://software.intel.com/en-us/intel-mkl〉 (Mar. 27, 2015).
Leiserson, C. E., Rivest, R. L., Stein, C., and Cormen, T. H. (2001). Introduction to algorithms, MIT Press, Cambridge, MA.
Li, R., and Saad, Y. (2013). “GPU-accelerated preconditioned iterative linear solvers.” J. Supercomput., 63(2), 443–466.
Lippai, I. (2005). Colorado Springs utilities case study: Water system calibration/optimization, ASCE, Reston, VA.
Mair, M., Rauch, W., and Sitzenfrei, R. (2014). “Improving incomplete water distribution system data.” Procedia Eng., 70, 1055–1062.
Marchi, A., et al. (2014). “The battle of the water networks II (bwn-ii).” J. Water Resour. Plann. Manage., 04014009.
Möderl, M., Sitzenfrei, R., Fetz, T., Fleischhacker, E., and Rauch, W. (2011). “Systematic generation of virtual networks for water supply.” Water Resour. Res., 47(2), W02502.
Naumov, M. (2011). “Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU.”, NVIDIA, Santa Clara, CA.
Ostfeld, A., et al. (2008). “The battle of the water sensor networks (bwsn): A design challenge for engineers and algorithms.” J. Water Resour. Plann. Manage., 556–568.
Rennich, S., Stosic, D., and Davis, T. A. (2014). “Accelerating sparse Cholesky factorization on GPUs, IA3 workshop on irregular applications: Architectures and algorithms (held in conjunction with SC14).” Center for Adaptive Supercomputing Software, Richland, WA.
Roshani, E., and Filion, Y. (2015). “Water distribution system rehabilitation under climate change mitigation scenarios in Canada.” J. Water Resour. Plann. Manage., 04014066.
Rossman, L. (2010). “Storm water management model user’s manual version 5.0.” Water Supply and Water Resources Division National Risk Management Research Laboratory.
Rossman, L. A. (2007). “Discussion of” solution for water distribution systems under pressure-deficient conditions” by Wah Khim Ang and Paul W. Jowitt.” J. Water Resour. Plann. Manage., 566–567.
Rupp, K., Rudolf, F., and Weinbub, J. (2010). “ViennaCL—A high level linear algebra library for GPUs and multi-core CPUs.” Int. Workshop on GPUs and Scientific Applications, Dept. of Scientific Computing, Univ. of Vienna, Austria, 51–56.
Saad, Y. (2003). Iterative methods for sparse linear systems, Society for Industrial and Applied Mathematics (SIAM), Philadelphia.
Schenk, O. (2013). “PARDISO solver project.” 〈http://www.pardiso-project.org/〉 (Mar. 27, 2015).
Schenk, O., and Gärtner, K. (2002). “Two-level dynamic scheduling in PARDISO: Improved scalability on shared memory multiprocessing systems.” Parallel Comput., 28(2), 187–197.
Schenk, O., Gärtner, K., and Fichtner, W. (2000). “Efficient sparse lu factorization with leftright looking strategy on shared memory multiprocessors.” BIT Numer. Math., 40(1), 158–176.
Science and Technology Facilities Council. (2013). “The HSL mathematical software library.” 〈http://www.hsl.rl.ac.uk/〉 (Mar. 27, 2015).
Siew, C., and Tanyimboh, T. (2011). “The computational efficiency of Epanet-pdx.” Proc., 13th Annual Water Distribution Systems Analysis Conf., ASCE, Reston, VA, 22–26.
Siew, C., and Tanyimboh, T. (2012). “Pressure-dependent epanet extension.” Water Resour. Manage., 26(6), 1477–1498.
Sitzenfrei, R, and Rauch, W. (2015). “Optimizing small hydropower systems in water distribution systems based on long-time-series simulation and future scenarios.” J. Water Resour. Plann. Manage., 04015021.
Sitzenfrei, R., Mair, M., Möderl, M., and Rauch, W. (2011). “Cascade vulnerability for risk analysis of water infrastructure.” Water Sci. Technol., 64(9), 1885–1891.
Sitzenfrei, R., Möderl, M., and Rauch, W. (2013a). “Assessing the impact of transitions from centralised to decentralised water solutions on existing infrastructures-Integrated city-scale analysis with VIBe.” Water Res., 47(20), 7251–7263.
Sitzenfrei, R., Möderl, M., and Rauch, W. (2013b). “Automatic generation of water distribution systems based on GIS data.” Environ. Modell. Software, 47, 138–147.
Snir, M., Otto, S. W., Walker, D. W., Dongarra, J., and Huss-Lederman, S. (1995). MPI: The complete reference, MIT Press, Cambridge, MA.
Tanyimboh, T. T., and Templeman, A. B. (2010). “Seamless pressure-deficient water distribution system model.” Proc. ICE-Water Manage., 163(8), 389–396.
Todini, E., and Pilati, S. (1988). A gradient algorithm for the analysis of pipe networks, Research Studies Press, Taunton, U.K., 1–20.
Van de Geijn, R. A. (1997). Using PLAPACK: Parallel linear algebra package, MIT Press, Cambridge, MA.
Van Zyl, J. E., Savic, D. A., and Walters, G. A. (2004). “Operational optimization of water distribution systems using a hybrid genetic algorithm.” J. Water Resour. Plann. Manage., 160–170.
Volkov, V., and Demmel, J. (2008). “Lu, qr and Cholesky factorizations using vector capabilities of GPUs.”, EECS Dept., Univ. of California, Berkeley, CA.
von zur Gathen, J. (1993). “Parallel linear algebra.” Synthesis of parallel algorithms, Morgan Kaufmann Publishers, San Francisco, 574–617.
Wu, Z. Y., and Lee, I. (2011). “Lessons for parallelizing linear equation solvers and water distribution analysis.” Urban water management: Challenges and opportunities, Centre for Water Systems, Univ. of Exeter, Devon, U.K., 21–26.
Zecchin, A., Thum, P., Simpson, A., and Tischendorf, C. (2012). “Steady-state behavior of large water distribution systems: Algebraic multigrid method for the fast solution of the linear step.” J. Water Resour. Plann. Manage., 639–650.

Information & Authors

Information

Published In

Go to Journal of Water Resources Planning and Management
Journal of Water Resources Planning and Management
Volume 142Issue 3March 2016

History

Received: Nov 25, 2014
Accepted: Jul 29, 2015
Published online: Nov 20, 2015
Published in print: Mar 1, 2016
Discussion open until: Apr 20, 2016

Authors

Affiliations

Gregor Burger [email protected]
Postdoctoral, Unit of Environmental Engineering, Univ. of Innsbruck, Technikerstrasse 13, Innsbruck, 6020 Tirol, Austria. E-mail: [email protected]
Robert Sitzenfrei [email protected]
Assistant Professor, Unit of Environmental Engineering, Univ. of Innsbruck, Technikerstrasse 13, Innsbruck, 6020 Tirol, Austria (corresponding author). E-mail: [email protected]
Manfred Kleidorfer [email protected]
Assistant Professor, Unit of Environmental Engineering, Univ. of Innsbruck, Technikerstrasse 13, Innsbruck, 6020 Tirol, Austria. E-mail: [email protected]
Wolfgang Rauch [email protected]
Full Professor, Unit of Environmental Engineering, Univ. of Innsbruck, Technikerstrasse 13, Innsbruck, 6020 Tirol, Austria. E-mail: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share