Mining Marine Vessel AIS Data to Inform Coastal Structure Management

Scully, Brandan M.; Young, David L.; Ross, James E.

doi:10.1061/(ASCE)WW.1943-5460.0000550

Open access

Technical Papers

Dec 8, 2019

Mining Marine Vessel AIS Data to Inform Coastal Structure Management

Authors: Brandan M. Scully, Ph.D., F.ASCE https://orcid.org/0000-0001-8641-2142 [email protected], David L. Young, Ph.D. [email protected], and James E. Ross, Ph.D. [email protected]Author Affiliations

Publication: Journal of Waterway, Port, Coastal, and Ocean Engineering

Volume 146, Issue 2

https://doi.org/10.1061/(ASCE)WW.1943-5460.0000550

PDF

Abstract

This study demonstrates the use of a multiyear country-scale automatic identification system (AIS) data set to partition a nationwide portfolio of navigation structures managed by the US Army Corps of Engineers (USACE) into affinity groups based on emergent vessel traffic characteristics. The marine vessel AIS was originally intended to prevent the collision of ships at sea. As a remote sensing technology, it provides continuous monitoring for marine vessel traffic and has enabled a variety of unforeseen applications. The methodology presented uses spatial distance criteria to identify vessel traffic local to each structure. Metrics characterizing traffic behavior including traffic composition, spatial position, trip frequency, and traffic seasonality are derived from vessel data. AIS-derived metrics are combined into feature vectors describing each structure. Pearson correlation of feature vectors with r-neighborhood pruning of the affinity matrix is used to identify similar structure pairs. Semisynchronous label propagation is used to partition the structure portfolio graph into prototype groups with strong similarity in underlying traffic characteristics that may be further refined to align maintenance activity with organizational goals.

Introduction

The US Army Corps of Engineers (USACE) maintains a diverse national portfolio of approximately 1,200 coastal structures representing more than 30 structure types, most of which are nominally related to navigation. Structures range from a dozen meters up to 10 km in length; more than half exceed 50 years in age with some exceeding 100 years (Domurat 2012; Mitchell 2010). Based on contract expenditure data (USOMB 2018), the average maintenance cost for jetties was estimated to have exceeded $32.9 M annually between 2008 and 2017; annual estimated expenditures are shown in Fig. 1. Spending on coastal structures was distributed across 20 coastal states with Texas (TX) and Louisiana (LA) accounting for more than half of that expenditure as shown in Fig. 2.

Fig. 1. USACE jetty expenditures, 2008–2017. Jetties support navigation by definition. The cost to maintain all structures exceeds this estimate based on government contracting data. (Data from USOMB 2018.)

Fig. 2. USACE jetty expenditures by state, 2008–2017. Two states account for more than half of maintenance expenditures from 2008 to 2017. Most projects were below $5 M.

US law requires the USACE to analyze the risks to navigation safety associated with this portfolio (Public Law 114-332), and maintenance funding constraints compel the rational allocation of resources to maintain the larger navigation system (Mitchell 2010). Structure functionality is a mission-related outcome that can be used for portfolio prioritization, independent of structure condition (NRC 2012). It is possible to design coastal structures in consideration of vessel performance using numerical methods or physical models, but the problem of relating structures and vessel performance in the field by direct measurement was impractical prior to the wide adoption of the marine automatic identification system (AIS).

Prioritization efforts to date have instead focused primarily on structure condition and proxy metrics (e.g., commercial cargo throughput at associated project terminals) because relationships between structure function and vessel response are not well understood (Mitchell 2010; Domurat 2012). This paper demonstrates a method to quantitatively describe assets in the USACE coastal structure portfolio in terms of vessel traffic characteristics as documented in the historical AIS data record. A clustering technique is subsequently applied to identify affinity groups within the portfolio for the purpose of asset management.

The problem of managing a nationwide portfolio of structures with respect to vessel navigation is principally one of scale. The existing structure portfolio encompasses more than 1,200 structures. The proxy approach described by Mitchell (2010) requires detailed curation of relevant information from multiple sources on a per-structure basis. This type of analysis is expensive, prone to inconsistency related to data availability and timeliness, and ultimately provides no direct measure of structures in the context of vessels navigating nearby. An obvious path to improving this problem requires direct measurement of local traffic from a uniform data set that is repeatable across all structures.

Milne and Watling (2019) suggest research leveraging big data can positively influence transportation systems by fundamentally reassessing what data can say about the function of transportation systems. Marine vessel AIS provides ship-borne continuous monitoring data intended to improve navigational safety in real time (ITU 2014). The data arguably meet the big data definition proposed by Milne and Watling (2019) in that AIS data result from continuous monitoring, are not necessarily owned by the data analyst, and are of sufficient scale to apply statistical inference techniques. The availability of AIS data has spawned a variety of research over the last decade in conservation science (Robards et al. 2016; Le Guyader et al. 2018), port resilience (Touzinsky et al. 2018), monitoring navigable waterways (Mitchell and Scully 2014), monitoring and mapping vessel traffic flow networks (Etienne et al. 2015; Alessandrini et al. 2018; Guerrero et al. 2018), and relating vessel performance to coastal structures (Young and Scully 2018), all of which demonstrate relational syntheses with traditional information sources beyond the original purpose of AIS. This work is novel in its approach to describing physical infrastructure based on nearby traffic.

Leveraging the Marine Cadastre AIS data archive (BOEM and NOAA 2018) to inform maintenance is appealing because it exploits an existing investment by other federal agencies in real-time data collection, takes advantage of a single uniform nationwide data set (NRC 2012), and allows direct relationships between vessel motion and navigation structures to be examined. AIS data are available globally, allowing methods similar to those presented to be flexibly applied worldwide to any large-scale concern that involves the intersection of vessel behavior and geospatial point collections.

The Marine Cadastre AIS data include hundreds of millions of records, making a holistic approach to interrogating the data infeasible without the advantages of parallel high-performance computing. However, robust computation platforms are now commercially available and increasingly affordable, enabling the distillation of nation-scale investigations to the confines of a single study, and revealing larger-scale behaviors that are not apparent with a more limited approach. This avenue of analysis becomes ever more relevant in an era of constrained maintenance funding, in which obvious maintenance efficiencies and improvements have already been implemented.

Materials and Methods

AIS data are generated in real-time to improve maritime domain awareness and to prevent vessel collisions. The data contain dynamically generated time-stamped vessel operating information, such as position, course and speed over ground, and heading. Data also include static vessel identifying information, vessel dimensions, and vessel types (ITU 2014). Robards et al. (2016) outline vessel populations that carry AIS by mandate or by choice, noting an AIS coverage gap exists primarily in small domestic craft. USACE project authorizations are generally based on vessel traffic populations that overlap significantly with vessels mandated to carry AIS transceivers. However, the results of any analysis using AIS data should not be considered a comprehensive representation of all waterway traffic (Harati-Mokhtari et al. 2007). Other imperfections may be present in AIS data because of either errors/gaps in AIS transmissions or deliberate misreporting of vessel identity or location (Etienne et al. 2015).

AIS data are available from a variety of private, government, and commercial sources. The Marine Cadastre program provides AIS data that cover the United States with a ship position frequency of 1 min (BOEM and NOAA 2018). The data set has several features that make it ideal for application to the problem of analyzing the USACE portfolio of coastal structures: the coverage area overlaps the footprint of the USACE structure portfolio, several years of data are available, data are in a standard format, data are sampled at a regular frequency, and the high spatiotemporal resolution of the data enables direct measurement of vessel–structure interaction.

This study used Marine Cadastre data from Universal Transverse Mercator (UTM) zones 10, 11, 15, 16, 17, 18, and 19 for the 6-year period 2009 through 2014, which consume approximately 600GB of disk space. Computational requirements for processing 600GB worth of AIS data are high and expected to increase as more data are analyzed. The current data set was processed using 36 Intel Xeon cores running at 2.3 GHz with 117GB accessible memory within a single node of high-performance computing (HPC) resources. The code for processing this data was written in Python using the multiprocessing package for process management. Plans are in place for moving to a multinode process configuration to allow for usage of 1,000 s of cores for the purpose of a more complete analysis of the Marine Cadastre data set.

A decryption key was provided on request from the National Oceanographic and Atmospheric Administration (NOAA) to decrypt vessel-specific Maritime Mobile Service Identity (MMSI) data from 2010 to 2014 (2009 was not encrypted). Vessel MMSI from the Marine Cadastre data set were cross-referenced with a list of vessel identities obtained from the US Coast Guard that were validated using the Authoritative Vessel Identification Service (AVIS) (Winkler 2012), and vessels without a match were excluded from the analysis. The relevant AIS data attributes used in this study are described in Table 1 (ITU 2014). The data analyzed in this study are a subset of all vessel types available in the Marine Cadastre database. Table 2 lists the ship/cargo type codes and descriptions of the vessel types retained for analysis, as per the US AIS encoding guide (USCG 2012) contemporary with the AIS data analyzed.

Table 1. AIS data parameters used in analysis

Dimension	Type	Description
MMSI	Static	9-digit unique vessel identifier issued by the International Telecommunications Union.
Vessel and cargo type	Static	2-digit code broadly describing a ship’s type and cargo.
Transmission time	Dynamic	Greenwich Mean Time (GMT) timestamp of vessel dynamic report broadcast.
Latitude, Longitude	Dynamic	GPS-derived location coordinate of vessel antenna location at the time of broadcast in NGIA (1984).

Note: The analysis components used are fully described in ITU (2014).

Table 2. Ship and cargo type codes for analyzed vessels

Ship and cargo type code	Description
30	Fishing
31, 32	Towing (ahead or alongside, astern)
52	Tugs or workboats
6X^a, ^b	Passenger ships $\geq 100$ gross tons
7X^a	Cargo (freight) ships or integrated tub barge (ITB) vessels
8X^a	Tankers or integrated tug tank barge vessels

Source: Data from USCG (2012).

a

X indicates digits 0–9, representing all vessels in this class.

b

Passenger vessels

< 100

gross tons and high-speed craft coded as 4X were excluded.

Analysis of vessel–structure interaction is likely sensitive to the choice of search radius. If the radius is too small, it is possible to underrepresent the population of vessels interacting with the structure. Conversely, too large a search radius may result in inclusion of extraneous vessels. For the purpose of the present analysis, it was presupposed that vessels within a distance of 3.2 km interact with a coastal structure, without defining the nature of that interaction. Thus, a spatial filter was applied from the location coordinate (usually the midpoint) of 1,227 structures available from the USACE (CIRP 2011). The radius was made in consideration of typical sizes of coastal structures and navigation channels. It is typical for vessel operating in the vicinity of coastal structures to transit at speeds below

0.01 km / s

. Thus, at the temporal resolution of Marine Cadastre data, a vessel moving at

0.01 km / s

would be expected to cover a distance of 0.6 km between subsequent reports. For a coastal structure search space with a 3.2-km radius, a vessel navigating near the structure at

0.01 km / s

with no pauses or course deviations is expected to report at most 10 times. This number decreases as the vessel transits a chord further from the structure’s center and increases as the vessel speed decreases.

A transit counting function was applied to the retained AIS data at each structure. For the purpose of this study, a transit is defined as a sequence of position reports by the same vessel for which the time between reports is less than 360 s (i.e., twice the maximum expected AIS reporting interval of 180 s for ships moored or at anchor; ITU 2014). When a series of position reports by a unique MMSI has a gap that exceeds this reporting frequency, the transit count for that vessel is incremented by 1, and the transit count of all vessels is compiled to obtain the total count for each structure. In addition to the MMSI number and a transit identifier, the latitude, longitude, and transmission time of the transit start, end, and closest point of approach (CPA) to each structure are recorded from the AIS data. The workflows of the data filtering, transit function application, and aggregation are shown in Fig. 3.

Fig. 3. Vessel-structure counting workflow.

The vessel transit counting operation resulted in a time series of observed vessel transits for 1,049 of the 1,227 structures in the USACE database (85%). These time series were used to compute informative metrics including the average CPA distance, a metric to assess the diversity of the vessel user base (the entropy of vessel transits), the number of transits per unique vessel, and a metric to assess the seasonality of the vessel traffic (the coefficient of determination for the seasonal component of a seasonal decomposition model). Computing these metrics provides: (1) descriptive insight into vessel populations near each structure, (2) a method to find structures of interest—usually by the identification of outliers, and (3) the ability to identify unique groupings of structures based on specific vessel type or behavior. The method of computation for each metric and the metric relevance are discussed in the following sections.

Average Distance

For each vessel transit at a given structure, the CPA distance (

x

) is computed between the vessel position at the CPA and the structure coordinates. The average of the distance between each structure and the CPA for all transits at that structure is

\bar{x}

.

Entropy

Information entropy quantifies the amount of information yielded from each new observation of a stochastic data source and, analogous to the statistical thermodynamics definition of entropy, describes the degree of uncertainty or disorder in the data (Shannon 1948; Jaynes 1965). For discrete random variables in which the outcome is always the same, each new observation of the variable carries little-to-no information about the system (low information entropy—i.e., the prior observations already perfectly describe the system). However, for a discrete random variable with a close-to-uniform distribution each additional observation carries substantial new information about the behavior of the system (high information entropy).

Entropy is calculated as shown in Eq. (1)

E = - \sum_{i = 0}^{n} p_{i} \log (p_{i})

(1)

where

n

= number of possible outcomes for the discrete random variable; and

p_{i}

= probability of each outcome occurring (Pathria and Beale 2011). The maximum possible value of entropy is

E_{\max} = \log (n)

, which would occur if the random variable had a perfectly uniform distribution (i.e., each discrete outcome had an equally likely chance of occurrence). The minimum possible value is

E_{\min} = 0

, which occurs if the observances of the random variable always fall within one of the

n

categories. This metric is useful for easily identifying how clustered a discrete random variable is around one or a few possible outcomes, such as determining how clustered the vessel transits

(E_{t t})

or unique vessels

(E_{t u})

are around specific vessel types for a given structure.

Seasonal Decomposition

To perform an additive naïve seasonal decomposition on time-series data (in this case the numbers of vessel transits and unique vessels transiting a given structure in each month), the time-series signal (

y

) is considered equal to the sum of three parts: (1) the general trend (

T

), (2) the seasonal component (

S

), and (3) the noise, or residual (

N

);

y = T + S + N

(Shmueli and Lichtendahl 2016). The trend component,

T

, is defined as the rolling average of

k

consecutive values of the time-series, with

k

depending on the time-series sampling frequency (Hyndman and Athanasopoulos 2018). For monthly observations conducted over many years, it is common to perform a rolling average over the past year (

k = 12

). The seasonal component is estimated from the detrended time series

(y - T)

and is calculated by obtaining the average value of the detrended signal for that season. For monthly data, the seasonal component for each month is the average of all the detrended values occurring in the month. These values are collectively adjusted up or down such that they sum to zero to obtain the final estimate of the seasonal component for each month,

S

. The noise (

N

) is what remains of the original signal once the trend and seasonal components are subtracted

(N = y - T - S)

. The strength of the trend (

F_{T}

) and seasonal components (

F_{S}

) are defined as shown in Eqs. (2) and (3) (Wang et al. 2006) subject to the restriction that the

F_{T}

and

F_{S}

may not fall below zero

F_{T} = 1 - \frac{V a r (N)}{V a r (N + T)}

(2)

F_{S} = 1 - \frac{V a r (N)}{V a r (N + S)}

(3)

The strength of the seasonal trend allows the seasonality of vessel traffic at a particular structure to be quantified, both by the number of vessel transits per month

(F_{S T})

and by the unique vessels passing the structure in each month

(F_{S U})

. It also allows for easy comparison of the seasonality between structures using a single value. Note that other, more sophisticated, seasonal decompositions are available [e.g., seasonal auto-regressive integrated moving average (S-ARIMA) models], but the naïve decomposition is sufficient to suit the present case.

The subsequent Results and Discussion section addresses this study’s findings with respect to the aforementioned metrics. Specifically, the section presents and examines (1) the distribution of

\bar{x}

and unique cases, (2) cyclical events and structure seasonality, (3) the diversity of structure user base, and (4) the identification of distinct structure communities via clustering. These are described within the context of their application to structure portfolio management.

Results and Discussion

In total, 21.8 million transits were identified for the 6-year analysis period. After validating vessel identities and filtering to the vessel types listed in Table 2, the list of structures was filtered to include structures with plausible navigation functions. Structures were retained if their type was listed as jetty, breakwater, dike, embankment, seawall, barrier, beach hammock, revetted mole, or wave absorber, or if “breakwater” or “jetty” appeared in the structure name when the type was not listed. The data retained for subsequent analysis include 8.9 million vessel transits representing 13,507 unique vessels at 865 navigation structures. The distribution of vessel transit and unique vessel observations is shown in Fig. 4. The maximum number of vessel transits (274,454) and maximum number of unique vessels (5,912) were observed at Port Bolivar, Texas, whereas 27 structures had only 1 observed vessel transit over the 6-year period.

Fig. 4. (a) Distribution of observed transits; and (b) distribution of unique vessels observed at 865 structures.

The distribution of

\bar{x}

, shaded by the number of transits occurring within each average CPA distance bin, is shown in Fig. 5. The mean and median values of

\bar{x}

are 1,459 and 1,330 m, respectively. Vessel transits are most frequent where

\bar{x}

values are below 2.5 km. Based on the concentration of users at approximately 0.75 and 1.5 km, and the relatively low number of observations between 2.5 and 3.2 km, the initial selection of 3.2 km as a generalized search radius appears sufficient to capture the vessel activity of most structures. However, it remains possible that 3.2 km may not be suited to all structures, and it may be desirable to develop more sophisticated structure-specific search filters.

Fig. 5. Distribution of average CPA for analyzed structures.

Fig. 6 shows individual time series of monthly

\bar{x}

values for Port Bolivar, TX and the west jetty at Bayou Lafourche, LA. The Port Bolivar time series is relatively stable, at approximately 2.0 km. The monthly values of

\bar{x}

at the Bayou Lafourche west jetty, near Port Fourchon, LA, are comparatively noisy. Monthly values of

\bar{x}

decrease at Bayou Lafourche from approximately August 2010 to March 2011 (to a distance of 0.2–0.4 km) compared to the rest of the time series where

\bar{x}

values of 0.6–1.0 km are typical. Port Fourchon services 90% of oil production in the Gulf of Mexico (Port Fourchon 2018). Thus, the authors speculate that this change in vessel behavior may be associated with recovery activities following the Deepwater Horizon oil spill, which persisted for 87 days beginning April 20th, 2010 (USCG 2011). This example illustrates vessel-derived information that could be made available to decision makers in real time for which present methods provide no analog. The availability of high-resolution data quantifying vessel behavior provides an opportunity to shift maintenance decisions toward consideration of events with the potential to influence vessel usage patterns at a specific structure.

Fig. 6. Individual time series of monthly $\bar{x}$ values of vessel transits at Port Bolivar, TX and Bayou Lafourche, LA.

Cyclical events, including periodic adverse sea states, the presence of marine ice (Stoddard et al. 2016), or transient user populations, may contribute to vessel traffic seasonality. Fig. 7 shows the distribution of structure seasonality among the structure portfolio based on the coefficient of determination of the seasonal component of the monthly traffic time series decomposition,

F_{S T}

. Figs. 8(a and b) show a comparative time series for (a) Raccoon Creek Jetty, New Jersey (NJ),with the lowest seasonality,

F_{S T} = 0.081

, and (b) Cape Vincent Breakwater, New York (NY), which is among the most seasonal structures in the portfolio with

F_{S T} = 0.952

. Although most structures do not show strong seasonality, a nontrivial number of highly seasonal structures bias the distribution and represent a disproportionate number of vessel transits. Understanding the cause of traffic seasonality may be useful in timing structure repair, identifying the need for a different type of structure, or determining whether it is feasible to reduce, defer, or cease structure maintenance. For example, ice damages coastal structures by fracturing or dislodging structure armor (USACE 2006) and it is reasonable to assume that heavy ice presence may increase structure maintenance and repair cost. Cape Vincent Harbor on Lake Ontario is near the US border with Canada and the western end of the Saint Lawrence River. The harbor is subject to winter ice from January through March (NOAA 2018), which may explain why observed traffic declines sharply in winter months as shown in Fig. 8. It is reasonable to consider that seasonal traffic combined with northern structure latitude may inform the potential need for additional maintenance of ice-prone structures.

Fig. 7. Distribution of vessel traffic seasonality based on vessel transit count.

Fig. 8. Examples of (a) the underlying traffic composition for selected *E_tt* values; and (b) the underlying user diversity for selected *E_tu* values.

The number of transits per unique vessel (

r_{t u}

) provides another simple but informative measure of activity near vessel structures. The median vessel in the 6-year data set makes 15 trips as shown on the distribution of

r_{t u}

in Fig. 9. However, the mean is skewed higher because of a small number of structures with very large

r_{t u}

values. The (a) traffic and (b) user base composition for structures with the five highest

r_{t u}

values are shown in Figs. 10(a and b). Each of these top five structures range in diversity of the unique vessel population observed near the structure during the 6-year analysis period as shown in Table 3, but the vessel transits are dominated by passenger ferries making repeat trips. Interestingly, AIS contains sufficient information to determine that the ferries at four of these structures belong to private operators while ferry traffic at the fifth structure is operated by a public steamship authority. That the top five structures exhibit similar traffic patterns suggests

r_{t u}

may have utility in structure cluster identification, which in turn enables the possibility of management by traffic behavior types. For instance, the goal of improving passenger safety outcomes may be advanced by investing in infrastructure for which local traffic is dominated by passenger vessels.

Fig. 9. Distribution of structures transits per unique vessel, $r_{t u}$ .

Fig. 10. (a) Percent of transits by vessel type; and (b) percent of unique vessel type for the top five structures based on transits per unique vessel.

Table 3. Traffic and user-base composition of structures with top five number of transits per unique vessel,

r_{t u}

Structure	Fish		Tug		Work		Passenger		Cargo		Tanker
Structure	No. of unique vessels	No. of unique transits	No. of unique vessels	No. of unique transits	No. of unique vessels	No. of unique transits	No. of unique vessels	No. of unique transits	No. of unique vessels	No. of unique transits	No. of unique vessels	No. of unique transits
Port Clinton Jetties, OH	0	0	0	0	0	0	1	6,886	0	0	0	0
Hyannis Harbor Breakwater, MA	9	589	2	5	2	5	9	18,971	1	1	0	0
Mispillion River Jetties, DE	1	94	0	0	3	4	2	4,736	0	0	0	0
Lagoon Pond, MA	11	44	8	2,844	2	4	22	23,335	1	5	0	0
Lewis Bay, MA	9	572	2	6	2	4	10	13,056	1	1	0	0

Whereas

r_{t u}

measures the extent to which individual vessels make unique trips, the entropy of vessel transits by type (

E_{t t}

) and entropy of unique vessel types (

E_{t u}

) metrics measure the traffic and user base vessel type diversity. Figs. 11(a and b) show (a) the distribution of

E_{t t}

and (b) the distribution of

E_{t u}

; the possible range of these metrics in this case is 0–1.95. Figs. 12(a and b) illustrate the representative composition of (a) the underlying traffic in the case of

E_{t t}

and (b) the user base in the case of

E_{t u}

. As the value of either metric increases, the diversity of the underlying population (the transits made by vessel type or the unique vessels transiting the structure) increases from homogeneous when the value is 0 to uniformly distributed across the vessel type categories when the value is maximum. Thus, the possibility of management by user base–centered goals is possible with AIS-derived data. For instance, the management goal of supporting the most diverse traffic mix could be supported by prioritizing structures with high

E_{t t}

values.

Fig. 11. Distribution of (a) entropy of vessel transits by vessel type; and (b) the entropy of unique vessel types.

Fig. 12. Representative composition of (a) the underlying traffic in the case of $E_{t t}$ ; and (b) the user base in the case of $E_{t u}$ . As $E_{t t}$ increases the traffic type becomes more diverse; and as $E_{t u}$ increases, the diversity of the user base increases.

Finally, a feature vector for each structure was developed from 20 metrics (described in the Appendix) computed during this analysis with each metric standardized across features. An affinity matrix was developed by computing the Pearson correlation coefficient of every structure’s feature vector with that of every other structure. An r-neighborhood was applied to retain features with similarity exceeding the 90th percentile, retaining features with correlation values above 0.52 (anticorrelated features were excluded because they have no obvious interpretation for management). This resulted in an unweighted undirected graph containing 865 structure nodes and 37,412 edges representing structures that were strongly similar based on observed traffic patterns. A label propagation community detection algorithm (Cordasco and Gargano 2010), which iteratively assigns labels to graph nodes based on the labels of neighboring nodes, was applied to the graph. The resulting graph, colored by five emergent communities identified by the algorithm, is shown with a qualitative community label in Fig. 13.

Fig. 13. AIS-derived structure clusters.

Previously examined projects appear in three clusters. Raccoon Creek Jetty, NJ; Port Bolivar, TX; and Galveston Bay Entrance, TX all appear in the community characterized by high-volume cargo and tanker traffic. Cape Vincent Breakwater appears in the community of structures demonstrating moderate cargo traffic volume and high seasonality. Port Clinton, Lewis Bay, Lagoon Pond, Hyannis, Mispillion, Sandy Hook, and Bayou Lafourche are all found in the community characterized by relatively high passenger traffic and notably absent tanker traffic. Two remaining communities include (1) a community characterized by low overall traffic volume, high concentrations of fishing vessel traffic, and generally lacking cargo or tanker traffic that contains many projects, identified as subsistence harbors or harbors of refuge (Public Law 113-121) and (2) a community characterized by high concentrations of tow and work vessels that appear to follow inland routes such as the Atlantic Intracoastal Waterway and the Gulf Intracoastal Waterway.

The communities presented are not definitive because clustering is sensitive to the feature vector metric selection and the size of the r-neighborhood. However, the identified communities reflect management groups (high use, moderate use, and emerging harbors) used by the USACE (Public Law 113-121; Mitchell 2010). Grouping of structures is demonstrated from direct observation of vessels instead of reliance on proxy measures (cargo tonnage, commercial fish landings, ferry passenger statistics, etc.) of the parent project that have previously informed coastal structure management. By further modifying feature vectors to include additional metrics (e.g., frequency of wave loads exceeding structure design, coincidence of large waves and traffic, historical repair cost, local dredging costs, etc.) it may be possible to target funding for improvements in structure functionality where they will have the greatest impact on vessel traffic.

Conclusion

The present effort demonstrates a method for describing marine infrastructure from direct observation of nearby vessel traffic as recorded by AIS. By encoding a nationwide portfolio of 865 navigation structures spanning seven UTM zones in terms of metrics derived from nearby traffic in a high-performance parallel computing environment, the authors reduced the portfolio into five affinity groups with similar traffic patterns. Beyond simple vessel counts, characteristics including seasonality, vessel–structure spatial proximity, and traffic composition were considered. With no analog presently available to asset managers, this research represents an opportunity to fundamentally reassess the definition and measurement of structure performance, shifting away from proxy metrics currently employed in management.

In partitioning the portfolio, 20 metrics were derived from more than 8 million vessel transits observed from AIS data from 2009 to 2014. These include the gross number of unique users, individual transits, and the ratio of transits per unique user within 3.2 km of each structure. Traffic seasonality was quantified with the coefficient of determination of the seasonal component of monthly transit time-series decomposition. This approach enabled the authors to identify projects with high seasonality representing a large number of vessel transits. Entropy was calculated to quantify the diversity of unique vessel populations and total transits by vessel type, enabling the authors to identify a small grouping of structures with a small number of users making many repeat transits.

The closest point of approach distance of each transit to each structure was calculated. The mean distance was shown for at least one case to vary through time at the monthly scale, possibly in response to a significant regional event. Although a curated AIS data set available annually from BOEM and NOAA (2018) was used, large-scale anomaly detection could be employed in real time to identify traffic anomalies. Granular spatiotemporal vessel movement information is not presently used in the management of coastal structures by USACE; thus another opportunity is presented.

Combined, these techniques were useful to describe the range of assets within the portfolio, providing managers detailed insight into user groups and use patterns near structures. This is a step toward filling the knowledge gap identified by Domurat (2012). The authors highlight interpretations of the data that require further exploration to align asset management with overarching agency goals in a resource-constrained environment. The authors intend to expand this clustering technique to include additional environmental forcing and cost-related metrics to further partition structures. This particular application is germane to the USACE, but problems similarly concerned with AIS-carrying marine traffic, a large set of interest points (e.g., cities, counties, habitats), and a standard computation (e.g., dwell time, air emission, noise generation, etc.) will benefit from the technique.

Supplemental Data

The structure description and vessel traffic metric scores used in this analysis are available online in the ASCE library (www.ascelibrary.org).

Notation

The following symbols are used in this paper:

$E_{t t}$: entropy of the transits by vessel type;
$E_{t u}$: entropy of the unique vessels by vessel type;
$F_{S T}$: coefficient of determination of the seasonal component of the monthly transit time-series decomposition;
$F_{S U}$: coefficient of determination of the seasonal component of the monthly unique user time-series decomposition;
$r_{t u}$: number of transits per unique vessel;
$x$: closest point of approach (CPA) distance between a structure $i$ and vessel $j$ ; and
$\bar{x}$: average closest point of approach (CPA) distance for structure $i$ .

Supplemental Materials

File (supplemental_data_ww.1943-5460.0000550_scully.zip)

Download
216.82 KB

Appendix. Description of Traffic Metrics

The metrics used in creating feature vectors to assess similarity of coastal structures based on observed vessel traffic can be found in the data repository. To account for differences in metric scale, metric values were standardized by subtracting the mean of each set of metrics and dividing by the standard deviation. The following descriptions are provided to clarify the nature and generation of the metrics.

r2Cnts: The coefficient of determination for overall fit of the time series decomposition model of vessel traffic at each structure based on the total number of observed vessel transits.

FsCnts: The coefficient of determination for the seasonal component of the time series decomposition model of vessel traffic at each structure based on the total number of observed vessel transits,

F_{S T}

.

r2Unq: The coefficient of determination for overall fit of the time series decomposition model of vessel traffic at each structure based on the total number of unique vessels observed.

FsUnq: The coefficient of determination for the seasonal component of the time series decomposition model of vessel traffic at each structure based on the total number of unique vessels observed,

F_{S U}

.

unq: The total number of unique vessels observed at each structure.

count: The total number of individual transits observed at each structure.

trips_per_unq: The average number of individual transits observed for unique vessels at each structure,

r_{t u}

.

avg_dist: For each structure coordinate pair,

x

is the distance between the coordinates and the AIS broadcast location nearest the structure for each observed transit within the search radius. Avg_dist,

\bar{x}

, is the average of these CPA distances for each structure.

fish_%: The fraction of the total number of observed vessel transits at each structure with ship and cargo type code 30.

fishUnq_%: The fraction of the total number of unique vessels observed at each structure with ship and cargo type code 30.

tow_%: The fraction of the total number of observed vessel transits at each structure with ship and cargo type code 31 or 32.

towUnq%: The fraction of the total number of unique vessels observed at each structure with ship and cargo type code 31 or 32.

work_%: The fraction of the total number of observed vessel transits at each structure with ship and cargo type code 52.

workUnq_%: The fraction of the total number of unique vessels observed at each structure with ship and cargo type code 52.

passenger_%: The fraction of the total number of observed vessel transits at each structure with ship and cargo type code 60 through 69.

passengerUnq_%: The fraction of the total number of unique vessels observed at each structure with ship and cargo type code 60 through 69.

cargo_%: The fraction of the total number of observed vessel transits at each structure with ship and cargo type code 70 through 79.

cargoUnq_%: The fraction of the total number of unique vessels observed at each structure with ship and cargo type code 70 through 79.

tanker%: The fraction of the total number of observed vessel transits at each structure with ship and cargo type code 80 through 89.

tankerUnq_%: The fraction of the total number of unique vessels observed at each structure with ship and cargo type code 80 through 89.

Acknowledgments

This research was funded by the Coastal Inlets Research Program of the US Army Corps of Engineers.

References

Alessandrini, A., V. F. Arguedas, and M. Vespe. 2018. “Vessel tracking data usage to map Mediterranean flows.” In Advances in shipping data analysis and modeling: Tracking and mapping maritime flows in the age of big data, 173–187. New York: Routledge.

Google Scholar

BOEM (Bureau of Ocean Energy Management) and NOAA (National Oceanic and Atmospheric Administration). 2018. “Automatic identification system (AIS) data.” Accessed January 23, 2019. https://marinecadastre.gov/ais.

Google Scholar

CIRP (Coastal Inlets Research Program). 2011. “Inlets database. (KMZ file).” Accessed January 23, 2019. https://cirpwiki.info/wiki/File:Inlets_071811_kmz.zip.

Google Scholar

Cordasco, G., and U. Gargano. 2010. “Community detection via semi-synchronous label propagation algorithms.” In Proc., 2010 IEEE Int. Workshop on Business Applications of Social Network Analysis (BASNA), 1–8. New York: IEEE.

Google Scholar

Domurat, G. 2012. “Navigation/coastal structure asset management: Overview and status.” Accessed October 17, 2018. http://onlinepubs.trb.org/onlinepubs/conferences/2012/Metrics/presentations/42-Domurat.pdf.

Google Scholar

Etienne, L., E. Alincourt, and T. Devogele. 2015. “Maritime network monitoring.” In Marine Networks: Spatial structures and time dynamics, 190–209. New York: Routledge.

Google Scholar

Guerrero, D., F. I. Gonzalez-Laxe, M. J. Freire-Seoane, and C. P. Montes. 2018. “Foreland mix and inland accessibility of European NUTS-3 regions.” In Advances in shipping data analysis and modeling: Tracking and mapping maritime flows in the age of big data, 207–230. New York: Routledge.

Google Scholar

Harati-Mokhtari, A., A. Wall, P. Brooks, and J. Wang. 2007. “Automatic identification system (AIS): Data reliability and human error implications.” J. Navig. 60 (3): 373–389. https://doi.org/10.1017/S0373463307004298.

Google Scholar

Hyndman, R. J., and G. Athanasopoulos. 2018. Forecasting: Principles and practice. Heathmont, VIC, Australia: OTexts.

Google Scholar

ITU (International Telecommunication Union—Radiocommunication Sector). 2014. “Recommendation ITU-R M.1371-5. Technical characteristics for an automatic identification system using time division multiple access in the VHF maritime mobile frequency band.” Accessed January 23, 2019. http://www.itu.int/dms_pubrec/itu-r/rec/m/R-REC-M.1371-5-201402-I!!PDF-E.pdf.

Google Scholar

Jaynes, E. T. 1965. “Gibbs vs Boltzmann entropies.” Am. J. Phys. 33 (5): 391. https://doi.org/10.1119/1.1971557.

Google Scholar

Le Guyader, D., C. Ray, and D. Brosset. 2018. “Identifying small-scale fishing zones in France using AIS data.” In Advances in shipping data analysis and modeling: Tracking and mapping maritime flows in the age of big data, 251–262. New York: Routledge.

Google Scholar

Milne, D., and D. Watling. 2019. “Big data and understanding change in the context of planning transport systems.” J. Transp. Geog. 76 (Apr): 235–244. https://doi.org/10.1016/j.jtrangeo.2017.11.004.

Google Scholar

Mitchell, K. N. 2010. “A review of coastal navigation asset management initiatives within the Coastal Inlets Research Program (CIRP) Part 1: Coastal structures.” Accessed January 23, 2019. https://chl.erdc.dren.mil/tools/chloldwebsite/CHL%20OLD%20WEBSITE/chl.erdc.usace.army.mil/library/publications/chetn/pdf/chetn-iii-80.pdf.

Google Scholar

Mitchell, K. N., and B. M. Scully. 2014. “Waterway performance monitoring with Automatic Identification System data.” Transp. Res. Rec. 2426 (1): 20–26. https://doi.org/10.3141/2426-03.

Google Scholar

NGIA (National Geospatial-Intelligence Agency). 2019. “World Geodetic System website of the NGA.” Accessed November 6, 2019. https://www.nga.mil/ProductsServices/GeodesyandGeophysics/Pages/WorldGeodeticSystem.aspx.

Google Scholar

NOAA (National Oceanographic and Atmospheric Administration). 2018. “United States coast pilot.” Accessed January 23, 2019. https://nauticalcharts.noaa.gov/publications/coast-pilot/index.html.

Google Scholar

NRC (National Research Council). 2012. Predicting outcomes of investments in maintenance and repair of federal facilities. Washington, DC: National Academies Press.

Google Scholar

Pathria, R. K., and P. Beale. 2011. Statistical mechanics. 3rd ed. Amsterdam, Netherlands: Academic Press.

Google Scholar

Port Fourchon. 2018. “Port facts.” Accessed November 14, 2019. http://portfourchon.com/seaport/port-facts/.

Google Scholar

Robards, M. D., G. K. Silber, J. D. Adams, J. Arroyo, D. Lorenzini, K. Schwehr, and J. Amos. 2016. “Conservation science and policy applications of the marine vessel Automatic Identification System (AIS)—A review.” Bull. Mar. Sci. 92 (1): 75–103. https://doi.org/10.5343/bms.2015.1034.

Google Scholar

Shannon, C. E. 1948. “A mathematical theory of communication.” Bell Sys. Tech. J. 27 (3): 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.

Google Scholar

Shmueli, G., and K. C. Lichtendahl. 2016. Practical time series forecasting with R: A hands-on guide. Green Cove Springs, FL: Axelrod Schnall Publishers.

Google Scholar

Stoddard, M. A., L. Etienne, M. Fournier, R. Pelot, and L. Beveridge. 2016. “Making sense of Arctic maritime traffic using polar operational limits assessment risk indexing system (POLARIS).” In Vol. 34 of Proc., IOP Conf. Series: Earth and Environmental Science, 012034. Bristol, UK: IOP Publishing.

Google Scholar

Touzinsky, K. F., B. M. Scully, K. N. Mitchell, and M. M. Kress. 2018. “Using empirical data to quantify port resilience: Hurricane Matthew and the southeastern seaboard.” J. Waterway, Port Coastal Ocean Eng. 144 (4): 05018003. https://doi.org/10.1061/(ASCE)WW.1943-5460.0000446.

Google Scholar

USACE. 2006. Coastal engineering manual. Washington, DC: USACE.

Google Scholar

USCG (United States Coast Guard). 2011. “On scene coordinator report, Deepwater Horizon oil spill.” Accessed November 14, 2018. https://homeport.uscg.mil/Lists/Content/Attachments/119/DeepwaterHorizonReport%20-31Aug2011%20-CD_2.pdf.

Google Scholar

USCG (United States Coast Guard). 2012. “USCG AIS encoding guide v.2012-01-12.” Accessed January 23, 2018. https://www.navcen.uscg.gov/pdf/AIS/AISGuide.pdf.

Google Scholar

USOMB (United States Office of Management and Budget). 2018. “USASPENDING webiste of the OMB.” Accessed August 8, 2018. https://www.usaspending.gov/#/.

Google Scholar

Wang, X., K. A. Smith, and R. J. Hyndman. 2006. “Characteristic-based clustering for time series data.” Data Min. Knowl. Disc. 13 (3): 335–364. https://doi.org/10.1007/s10618-005-0039-x.

Google Scholar

Winkler, D. 2012. AIS data quality and the Authoritative Vessel Identification Service (AVIS). Arlington, VA: National GMDSS Implementation Task Force.

Google Scholar

Young, D. L., and B. M. Scully. 2018. “Assessing structure sheltering via statistical analysis of AIS data.” J. Waterway, Port Coastal Ocean Eng. 144 (3): 04018002. https://doi.org/10.1061/(ASCE)WW.1943-5460.0000445.

Google Scholar

Information & Authors

Information

Published In

Journal of Waterway, Port, Coastal, and Ocean Engineering

Volume 146 • Issue 2 • March 2020

Copyright

This work is made available under the terms of the Creative Commons Attribution 4.0 International license, http://creativecommons.org/licenses/by/4.0/.

History

Received: Dec 18, 2018

Accepted: Jul 3, 2019

Published online: Dec 8, 2019

Published in print: Mar 1, 2020

Discussion open until: May 8, 2020

Authors

Affiliations

Brandan M. Scully, Ph.D., F.ASCE https://orcid.org/0000-0001-8641-2142 [email protected]

P.E.

Research Civil Engineer, Coastal and Hydraulics Laboratory, US Army Engineer Research and Development Center, 69A Hagood Ave., Charleston, SC 29403 (corresponding author). ORCID: https://orcid.org/0000-0001-8641-2142. Email: [email protected]

View all articles by this author

David L. Young, Ph.D. [email protected]

Research Civil Engineer, Coastal and Hydraulics Laboratory, US Army Engineer Research and Development Center, 1261 Duck Rd., Duck, NC 27949. Email: [email protected]

View all articles by this author

James E. Ross, Ph.D. [email protected]

Research Computer Scientist, Information Technology Laboratory, US Army Engineer Research and Development Center, 3909 Halls Ferry Rd., Vicksburg, MS 39180-6199. Email: [email protected]

View all articles by this author

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Mining Marine Vessel AIS Data to Inform Coastal Structure Management

Abstract

Introduction

Materials and Methods

Average Distance

Entropy

Seasonal Decomposition

Results and Discussion

Conclusion

Supplemental Data

Notation

Supplemental Materials

Appendix. Description of Traffic Metrics

Acknowledgments

References

Information & Authors

Information

Published In

Copyright

History

Authors

Affiliations

Metrics & Citations

Metrics

Citations

Download citation

Cited by

View Options

Media

Figures

Other

Tables

PREVIOUS ARTICLE

NEXT ARTICLE

Verify Phone

Congrats!

Abstract

Introduction

Materials and Methods

Average Distance

Entropy

Seasonal Decomposition

Results and Discussion

Conclusion

Supplemental Data

Notation

Supplemental Materials

Appendix. Description of Traffic Metrics

Acknowledgments

References

Information

Published In

Copyright

History

Authors

Affiliations

Metrics

Citations

Download citation

Cited by

Figures

Other

Share

Copy the content Link

Share with email

Share

Request Username

Create a new account

Change Password

Password Changed Successfully

Verify Phone

Congrats!