Open access
Technical Papers
Sep 28, 2020

Bridge Damage Recognition from Inspection Reports Using NER Based on Recurrent Neural Network with Active Learning

Publication: Journal of Performance of Constructed Facilities
Volume 34, Issue 6

Abstract

A deep understanding of the cause-effect relationship of bridge damages provides an opportunity to design, construct, and maintain bridge structures more effectively. The damage factors (i.e., bridge element, damage, and cause) and their complex relationships can be extracted from bridge inspection reports; however, it is not practical to manually read a considerable number of inspection documents and extract such valuable information. Although existing studies attempted to automatically analyze inspection reports, they require a large amount of human effort for data labeling and model development. To overcome the limitations, the authors propose an efficient information acquisition approach that extracts damage factors and causal relationships from bridge inspection reports. The named entity recognition (NER) model was developed based on a recurrent neural network (RNN) and was trained with the active learning method. In the experiments performed with 1,650 sentences (i.e., 1,300 for training and 350 for testing), the developed model successfully classified categories of text words (i.e., damage factors) and captured their causal relationship with 0.927 accuracy and 0.860 F1 score. Besides, the active learning method could significantly reduce the human effort required for data labeling and model development. The developed model achieved 0.778 F1 score only using 140 sentences, requiring less than an hour for manual labeling. These results meant that the model was able to successfully extract major damage factors and their cause-effect relationships from a set of text sentences with little effort. Consequently, the findings of this study can help field engineers to design, construct, and maintain bridge structures.

Introduction

Bridges play a key role in maintaining traffic flow and contribute to the economic development of a society; thus, it is essential to secure bridge infrastructure safety for public safety and the national economy (Deng et al. 2016; LeBeau and Wadia-Fascetti 2007). However, the bridge maintenance has suffered from a large number of aging bridges and limited resources of time, cost, and experts (Frangopol et al. 2001; Robelin and Madanat 2007). As bridges are known for sharing common causes and damages, a deep understanding of the cause-effect relationship of bridge damages provides an opportunity to design, construct, and maintain bridge structures more effectively (Kanda et al. 2000).
Bridge inspection reports involve the cause-effect relationship of bridge damages because the reports describe both of the historical defects identified during inspections and the causes of the defects (FHWA 2012; KISTEC 2017) (Fig. 1). Therefore, many studies have paid attention to analyze the cause-effect relationship of bridge damages from the buried information in inspection reports (Jeon et al. 2017; Lokuge et al. 2016; Peris-Sayol et al. 2017). However, the considerable number of bridge inspection reports makes it impractical to collect, read, and utilize such valuable information manually (Liu and El-Gohary 2017; Ryu and Shin 2014). As introduced in Fig. 1, the useful information that the bridge engineers need is spread over the text in reports. Since the text is written in unstructured natural language, bridge engineers should manually read and understand every text to extract the historical cause-effect information from the reports. For example, given a sentence “water leaking occurred due to the junction boxes filled with rainwater,” the engineer should read and understand all the texts to obtain the informative keywords, such as water leaking, junction boxes, and rainwater. Considering that there are numerous text sentences in one inspection report, an automated information extraction method is needed to retrieve valuable information from a significant number of bridge inspection reports.
Fig. 1. Sample bridge inspection report from Bridge Inspector’s Reference Manual; report describes bridge elements examined during inspection, damage of elements, and possible causes of damage. (Reprinted from FHWA 2012.)
Text mining enables automatic information retrieval from numerous text data, resolving the impracticality of the manual process. Particularly, named entity recognition (NER) is one of the most intuitive approaches to extract useful information from text. NER aims to recognize named entities that indicate the informative keywords from text data (Spasic et al. 2005; Tanabe et al. 2005; Zhu et al. 2013). In the construction industry of facility maintenance, previous studies have proposed NER models extracting information about damages, causes, and suggestions for maintenance from the reports. The previous studies have applied NER to text data for automatic—and therefore efficient—information extraction. However, most studies have not yet addressed the limitation that NER models require labeled text as training data, which involves a tremendous human effort to assign the label manually (Liu and El-Gohary 2017). Since NER is a classification task that classifies each word to a proper category, the model needs to be trained based on a labeled corpus first (i.e., a set of text data). In other words, it is essential to manually build an extensive and high-quality training data set, which is extremely time-consuming, expensive, and labor-intensive. In cases in which raw text data are abundant, but labeling costs are expensive, the active learning method can reduce the amount of manual labeling significantly. It is because the method selects the most meaningful-to-learn instance from unlabeled data and asks human annotators to label the selected data first (Settles 2009).
The authors propose an efficient information acquisition approach that automatically extracts bridge damage factors from inspection reports. Particularly, a concept of the active learning is adapted to minimize the human effort required for data labeling and model development. The research aims to develop a NER model based on recurrent neural networks (RNN) and to convince that the integration with active learning reduces the cost of data labeling considerably. The developed model can automatically retrieve information about the cause-effect relationship of bridge damages. In practice, such causal information allows practitioners to design, construct, and maintain bridge structures more effectively. To be specific, if potential damages are predicted in the early phases of a new project, the construction engineer may design a bridge to eliminate the damage causes or take proper mitigation actions (e.g., redesign) (Li and Burgueño 2010). In addition, the causal information would benefit bridge inspectors and managers. They can determine the order of inspection priority; which bridge (or element) has the most urgent and serious causes (or damages)? It should be noted that, according to data availability, this paper focuses on bridge inspection reports written in the Korean language.
The rest of the paper consists of the following. First, the previous studies that utilized similar approaches as in this paper are reviewed. After providing an overview of the bridge damage recognition model, each step to develop the model is described in detail. Then, this paper demonstrates the performance metrics of the developed bridge damage recognition model, followed by a discussion on the results to validate its effectiveness and efficiency. Last, the paper concludes and suggests future research.

Literature Review

Named Entity Recognition

NER involves recognizing named entities in predefined categories in text data (Jurafsky and Martin 2014). The categories of named entities vary on the tasks; they can include people, places, or time expressions in general NER tasks, but can also target specific types of terminologies. For example, NER models have been developed for extracting biological terms (Habibi et al. 2017; Settles 2004; Tanabe et al. 2005; Zhu et al. 2013), or terms related to freight transportation (Seedah and Leite 2015).
NER is a classification task, in which a model classifies every word in the text data into an appropriate category, as well as a sequence-labeling task, in which every word in a sequence of words (e.g., sentence) is labeled for instance as an object, cause, outcome, or reference according to its role in the sentence so that the output of the NER task is a sequence of corresponding labels. Most of the NER models need a prelabeled text data as a training data in which every word in the training data is manually labeled before training the models (Tanabe et al. 2005). In construction, Liu and El-Gohary (2017) proposed a NER model that extracted information from bridge inspection reports written in English, recognizing bridge elements, damage, causes, and maintenance actions. Their NER model was based on predeveloped ontology, which is a set of relationships between terms in the text data, as well as manually engineered semantic features (e.g., semantic closeness of a word with preceding and succeeding word). Also, a semisupervised scheme was applied to training the model to reduce the manual effort to label all the words in the reports.
Although the model showed promising results for extracting information from the reports, there were two challenges in applying the method to the practice. First, building an ontology is extraordinarily time-consuming and labor-intensive. An ontology needs to contain all the relationships between every term used in the text data to be analyzed, such as synonyms, antonyms, or the category of each term. In addition, an ontology-based method is language-specific; in other words, an ontology cannot be applied to any other languages except the one on which the ontology is built. Second, although a semisupervised learning method showed the potential in reducing the number of text data to develop a NER model, model developers and users of the model have no means of interacting with the model. For example, a bridge inspection report may have errata that are not included in the first training samples of semisupervised learning. Such exceptions should be corrected during the training of the model or even the usage in practice. However, a semisupervised method does not provide a way of correcting the classification error of the model.

Recurrent Neural Network

Recurrent neural network (RNN), a neural architecture with a recurrent feedback loop, has been widely applied to sequential labeling tasks, including NER (Chiu and Nichols 2016; Huang et al. 2015; Lample et al. 2016; Rumelhart et al. 1986). In a RNN-based NER model, a word is fed into the model as a form of a multidimensional numeric vector. Such word vectors can contain syntactic information, such as the word’s part of speech, the word length or capitalization, or semantic information related to its meaning and relationship with other words. In contrast to a simple two-layer artificial neural network (ANN) model that classifies every single data object in a data set, a RNN-based NER model aims to classify all the words in a sequence as a whole (Fig. 2).
Fig. 2. ANN and RNN models: (a) simple two-layer ANN model that classifies each word by word; and (b) basic two-layer RNN model that classifies every word in a sentence simultaneously (bold arrow represents that two layers are fully connected).
RNN has shown its fitness to NER and has been applied to a variety of related tasks because of two main characteristics of RNN architecture. First, a RNN model can learn sequential patterns from sequential input data. For example, labeling the word leakage as a causal factor from the text sequence “…corruption in the pier which resulted from the leakage of the expansion joint…” can be more accurate and reliable when utilizing the information from its context than utilizing only the information of the word itself. The phrases corruption in the pier, resulted from, and (leakage) of the expansion joint indicate that the word leakage should be labeled as the cause of the damage (corruption in the pier). Second, a RNN model can receive sequential data regardless of the length of the sequence, which is especially important for text data analytics because the number of words in a sentence varies in natural language text data.
Among many variations of RNN, bidirectional long short-term memory (LSTM) showed superior performance in NER tasks to other methods (Huang et al. 2015). In a bidirectional RNN architecture, the recurrent loop in a hidden layer flows not only forward but also backward (Schuster and Paliwal 1997). This feature enables the model to use the information from the words in the latter part of a sentence, in determining a class of a word appearing at the beginning of the sentence. LSTM architecture was proposed to solve an issue of basic RNN architecture that long-term dependency does not last long enough (Hochreiter and Schmidhuber 1997). Rather than using a single function in its hidden unit (e.g., sigmoid and hyperbolic tangent), LSTM uses several subunits to retain information from preceding inputs, either adjacent to or apart from the current input. Details of bidirectional LSTM architecture will be subsequently explained in the manuscript.

Active Learning

As existing studies mainly focused on designing neural network architectures, the volume of training data has been recognized as one of the most crucial issues in machine learning approaches (Jallan et al. 2019). Likewise, developing a NER model also requires a large amount of training data labeled by human annotators, which are extremely expensive, time-consuming, and labor-intensive. In these contexts, active learning aims to minimize the number of training data and the cost for human labeling under an underlying assumption: it would be enough to use only a part of training data for obtaining a significant level of performance, by selecting and learning the most informative-to-learn instances first.
The active learning method asks humans to annotate the most uncertain-to-predict samples first. This process is analogous to a student who actively selects questions that are considered the most difficult to solve for the best efficiency. Settles and Craven (2008) analyzed general strategies for active learning in sequence labeling tasks, compared general methods for query selection and measurement of the confidence of model prediction, and tested several methods by applying those methods. The comparison results revealed that the performance of the methods depends on the corpus in which the methods were applied to, and no best strategy can be determined. However, the research also showed that the active learning method nevertheless performs better than a random selection method in terms of reducing the amount of labeling needed to achieve the same model accuracy (Settles and Craven 2008).
A key feature of the active learning method is the way of selecting the most informative samples from unlabeled data to be labeled by a human annotator. Settles (2009) surveyed and analyzed strategies for query selection, including uncertainty sampling, which is the simplest and the most commonly used in active learning applications. Prediction by a model can be considered less confident if the uncertainty of the prediction is high; the uncertainty of prediction is generally measured by the entropy of prediction (Shannon 1948). Tomanek and Hahn (2009) proposed a semisupervised active learning process for sequence labeling tasks, which focused specifically on the highly uncertain tokens in a sequence and assumed that the other remaining tokens with low uncertainty were correctly labeled by the model. Their results showed that their active learning method could reduce the human effort needed to manually label words in text data by 60% in terms of the number of labeled words out of the total words.

Bridge Damage Factor Recognition Model

This study proposed a model for automatically recognizing damage factors from the text taken from bridge inspection reports. The proposed model received a sentence as input and outputted a sequence of labels for all words in the input sentence. A label in the output sequence can be one of four predefined categories: bridge element, damage, cause, and others. Bridge element included the components of bridge superstructure (e.g., pavement, deck, expansion joint, girder) and substructure (e.g., pier, abutment, foundation). Damage explained the types of damage identified during the inspection (e.g., crack, breakage, efflorescence of concrete). Cause represented causes of damage, such as heavy traffic or weather-related factors (e.g., raining, snowing). Others covered the remaining words that are irrelevant to the cause-effect relationship, such as prepositions and postpositions. For example, a sentence “substructure elements were contaminated due to the breakage of the expansion joint” can be assigned as “[substructureelements]bridgeelement [werecontaminated]damage [dueto]others [thebreakage]cause [of]others [theexpansionjoint]bridgeelement” (Fig. 3).
Fig. 3. Bridge damage factors in sample sentence.
It should be noted that classifying a word as damage or cause is context-dependent. For example, in the sentence “substructure elements were contaminated due to the breakage of the expansion joint,” the word breakage should be classified as cause because it is described as the cause of the substructure damage. To do that, NER was applied to the model’s recognition process so that the model eventually recognized damage described in the sentence, its causal factors, and the elements in which the damage occurred. Since NER is a sequence-labeling task, the bridge damage factor recognition model is also a sequence-labeling model, in which the inputs are sequences of words, and the outputs are predicted labels for corresponding words.
The process of developing the model was as follows (Fig. 4). First, text data in collected bridge inspection reports, which are often stored in human-readable formats such as portable document format (PDF), were converted into plain text data so that analytic tools can process such text data for the next steps. Second, the extracted raw text data were segmented into individual sentences, and the sentences are then separated or tokenized into individual words. Following this, the words in the tokenized sentences were converted or embedded into word vectors by a word embedding method. Third, the RNN architecture for the bridge damage recognition model was implemented. Finally, the RNN model was trained in an active learning scheme by using the tokenized and embedded sentences as training data for the model. Details of each step are described in the following sections.
Fig. 4. Development process of a bridge damage factor recognition model.

Preprocessing and Tokenizing

Bridge inspection reports in Korea are usually stored in PDF files or word-processing software for the Korean language, such as Hangeul word processor (HWP) files. Since standard data analytic tools cannot use data in such formats, raw text needed to be extracted from the report files as a plain text format. The language used in bridge inspection reports was found to be formal, and spelling and syntax, such as spacing or punctuation, were almost always correct. From this observation, this study applied rule-based sentence segmentation since all sentences should end with periods, not a question mark, an exclamation mark, nor any other punctuation mark.
Tokenization is the process of segmenting text data into individual words. The proposed bridge damage factor recognition model aimed to classify each tokenized word as a bridge element, damage, cause, or other. Since tokenization is language-specific, this paper briefly introduces characteristics of the Korean language and tokenization method for Korean, which were suggested by Kim et al. (2014).
The blank-separated language unit in Korean often consists of a word, which contains meaning in itself, and a functional unit such as a postposition, which is similar to a preposition but combines at the end of another word and is a distinctive feature of the Korean language. Therefore, a tokenizer for Korean text has to identify and separate a word from the blank-separated unit. Identifying a complete word in Korean text is generally based on a list of all the vocabulary used in the text. This plays a crucial role in tokenization because any word not included in the vocabulary list is not identified as a word. This study constructed a vocabulary list from the words used in bridge inspection reports and tokenized the text data in the reports based on the vocabulary list by applying a corpus-based tokenization process. As suggested by Kim et al. (2014), all word candidates were first identified from the text to be tokenized, and likelihoods that these word candidates are complete words were measured, which will be referred to as word score. Among all possible consecutive strings of characters in each blank-separated unit, the string whose word score was the largest was extracted as a word, and the other string of characters was separated from the extracted word.
Measuring a word score was based on cohesion probability (Kim 2013) and branching entropy (Jin and Tanaka-Ishii 2006). The cohesion probability of a text string represents the probability that the last character of the string is actually used when the other characters in the string are observed. The branching entropy, when given a text string, represents the uncertainty about the next character after the string. The cohesion probability for a text string is calculated on the assumption that a string of characters frequently used together is more likely to be a complete word. For example, the word concrete has a high cohesion probability because any possible subcombination of the word concrete (e.g., concr or concret) is rarely used in real text data. On the other hand, the word concrete deck has much lower cohesion probability than concrete because many possible options exist for the next character after concrete, considering usages such as concrete girder or concrete pier, and thus not considered as a single complete word. Finally, cohesion probability for a string of characters (i.e., word candidate) is calculated as
cohesion(c1,c2,,cn)=i=1n1P(c1,,ci+1|c1,,ci)n1
(1)
where c1,c2,,cn represent first, second, … nth character of the candidate word [Eq. (1)].
The branching entropy measures the uncertainty of the next character after given successive characters. For example, the string concret has a low branching entropy because it is evident that the character e must be used after concret. On the other hand, concrete has much higher branching entropy because it is much more uncertain about guessing which character will be used after the string. Branching entropy is calculated as
H(X|Xn=xn)=xXP(x|xn)logP(x|xn)
(2)
where H(X|Xn=xn) is the branching entropy of a character string X that ends with a character xn, and P(x|xn) is the probability that a character x will appear after the character xn, the last character of the given string [Eq. (2)]. High entropy implies there is high uncertainty in determining the next character that will be the most likely to appear after the given string, and therefore the given string would be a complete word by itself.
Combining the cohesion probability and the branching entropy, the likelihood that a given string was a complete word is measured, as suggested in Kim et al. (2014). In this process, word scores of all possible word candidates from the text data were measured, and a list of all possible vocabularies was constructed along with their corresponding word scores [Eq. (3)]
Score(word)=cohesion(word)×exp(H(word))
(3)

Word Embedding and Input Features of the Model

Every word in text sentences must be converted into numerical data so that the RNN model (subsequently described in further detail) can receive the sentences as inputs. The Word2vec method represents a word as a numerical vector based on the linguistic assumption that the surrounding words can infer the meaning of a word. Since a sentence was a sequence of words, the sentence to which the Word2vec method was applied was consequently represented as a matrix: a sequence of vectors. The matrix had the shape of (T,d), where T is the length of the sequence (the number of words in the sentence), and d is the number of dimensions of a word vector.
Mikolov et al. (2013b) proposed the continuous bag-of-words (CBOW) model and skip-gram model to efficiently represent a word as a word vector using artificial neural network architectures. From a set of text sentences, the CBOW model was trained to predict a specific word from its context using the context words as inputs with one hidden layer. The skip-gram model, in contrast, was trained to predict the context words from the specific word using the target word as the input. While the CBOW model showed a slight advantage in syntactic modeling tasks, the skip-gram model outperformed the CBOW model in semantic modeling tasks (Mikolov et al. 2013a). This study applied the skip-gram model to the word-embedding process because semantic relationships between bridge elements, damage, and its factors were more important than their syntactic relationships (Fig. 5).
Fig. 5. Example of applying skip-gram Word2vec model to the sentence “A little cat sits on”; the input word “cat” is represented in the hidden layer as a word vector.
In the skip-gram Word2vec method, a simple two-layer ANN was used to embed a word into a vector representation. The ANN architecture took a single word as its input, propagates it to the hidden layer with a certain number of hidden units, and predicted the other words that would appear around the target word. By repeating this process for all the words in the corpus, the weights of the ANN model were adjusted so that the neural network could predict the best likely surrounding words, given any word in the corpus. In the end, the model provided the values of the hidden units for an input word; the values of the hidden units were the components of a word vector of the word. In other words, the hidden layer of the skip-gram Word2vec model was the representation of the word vector, whose dimension was the same as the number of hidden units in the ANN model. In this paper, the dimension of the word vector was chosen to be 50.

Recurrent Neural Network Architecture of the Model

In this study, a bidirectional long short-term memory (bi-LSTM) architecture was tested, because a bi-LSTM model has advantages in capturing long-term sequential patterns as well as bidirectional sequential patterns (Hochreiter and Schmidhuber 1997; Huang et al. 2015). A long-term sequential pattern is essential in the case of recognizing damage and its factors in which they are not close in a sentence. For example, in the sentence “efflorescence on the bridge pier … caused by leakage of expansion joint,” leakage should be recognized as a damage factor according to the preceding context “efflorescence … caused by,” but a model needs to capture a long-term dependency between efflorescence and leakage since many unrelated words (e.g., on the bridge pier) could be used between these two related words.
In addition, the cause-effect relationship between the damage and its factors are often represented in various structures, including, for example, efflorescence due to the leakage or the leakage resulted in the efflorescence. The model was therefore required to learn such linguistic patterns, which were not only forward-directional but also backward-directional. In the case of efflorescence due to the leakage, for example, the word efflorescence can be recognized as damage based on the following context: due to. On the other hand, the word leakage can be recognized as a factor based on the preceding context: due to. Consequently, the bidirectional property of the RNN model in this study would be an appropriate option for recognizing damage, its factors, and bridge elements from the text sentences. In this paper, the bi-LSTM model was designed to have two hidden layers (one was forward-directional, and the other one was backward-directional) and had 50 hidden units in each layer.
To deal with a long-term dependency of sequential data, a LSTM unit has several gates as shown in Fig. 6. At each time step t (i.e., when dealing with tth input word), each gate in the LSTM unit is updated as follows in Eqs. (4)(9):
ft=σ(Wf·[ht1,xt]+bf)
(4)
it=σ(Wi·[ht1,xt]+bi)
(5)
Ct˜=tanh(WC·[ht1,xt]+bC)
(6)
Ct=ft*Ct1+it*Ct˜,where*ispointwiseproductoperation
(7)
ot=σ(Wo·[ht1,xt]+bo)
(8)
ht=ot*tanh(Ct),where*ispointwiseproductoperation
(9)
Fig. 6. Conceptual diagram of LSTM operation.
Besides a forward-directional LSTM layer, the bi-LSTM architecture in this paper has another LSTM layer that propagates the information backward as in Fig. 7.
Fig. 7. Conceptual diagram of bi-LSTM model and its relationship with word embedding.
The bi-LSTM architecture outputs a sequence of labels for the input sentence. Specifically, models evaluated the probabilities for each word in the sentence that the word was classified as either one of the four designated classes. Therefore, the prediction result for one word was represented as a four-dimensional (4D) vector, in which each element corresponded to each class and represents the probability that the word was classified as that class. The output sequence for the input sentence was therefore a sequence of the probability vectors, a matrix whose shape was (T,ny), where T is the length of the sequence (the number of words in the sentence) and ny is the number of classes to be classified, which in the case of this study is four (ny=4).

Model Training in an Active-Learning Setting

To address the issue that labeled text from the inspection reports is not available to train the preceding model and labeling the text is time-consuming and labor-intensive, this study proposed a methodology for training the model: the active learning method. The model was trained using the pool sampling strategy, because stream-based sampling asks the human annotator to label all the words in a single sentence, wait until the model is trained, and label the words in a new sentence, giving too much cognitive load in doing repetitive tasks.
This study adopted an uncertainty sampling method for query selection because it fitted well with the outputs of the RNN model, which is the sequence of probabilities, and was easy to be implemented using the probability outputs. As a sampling unit is a sentence and a labeling unit is a word, the authors quantified the uncertainties in two levels: word-level and sentence-level. The word-level uncertainty was measured as entropy, and the sentence-level uncertainty was determined as the arithmetic average of word-level entropies in a sentence. In detail, the word-level entropy (i.e., H(word)) and the sentence-level entropy (i.e., H(sentence)) were calculated using Eqs. (10) and (11), respectively
H(word)=i=1,2,3,4P(yi|word)logP(yi|word)
(10)
H(sentence)=1Ni=1,2,NH(wordi)
(11)
where yi is the predicted label of the word (i.e., y1 = bridge element, y2 = damage, y3 = cause, and y4 = others), and N is the number of total words in a given sentence.
Since the entropy quantifies the amount of information required to interpret input data, it measures the uncertainty for word categories (Shannon 1948). This uncertainty sampling method selects the meaningful-to-learn instances from unlabeled data based on the entropy, and asks the human to assign the correct labels.
The model training process steps are outlined next:
1.
Initial training data were selected from the collected and unlabeled text data in the reports; they were then manually labeled and added to the training set, which was used in susequent procedures to train the model.
2.
The model was trained based on the training set.
3.
The model predicted labels of the other remaining unlabeled data, calculated the sentence-level entropies of each sentence from the prediction results, and selected the queries with the highest entropies.
4.
A human annotator labeled the selected queries, and the labeled sentences were then added to the training set.
5.
Procedures from steps 2 to 4 were repeated until all the data were labeled and used to train the model.
The research team implemented the repetitive active learning processes by developing a web-based user interface (Fig. 8). Particularly, the authors customized the method developed by Mayhew and Roth (2018) to allow human annotators to label text words one by one. The interface shows the top 10 informative-to-learn sentences at once, which are selected by the active learning algorithm. If human annotators place a computer mouse over each word, the interface will display the possible categories (i.e., bridge element, damage, cause, and others). The annotator can click one category for each word, and the labeled sentences are then used to train and update the prior model. These processes are repeated until the model performance is greater than a predefined threshold value. Otherwise, the annotator can stop the processes if needed.
Fig. 8. GUI interface for active learning.

Experimental Results

Experimental Setup

In this study, the authors collected 1,188 inspection reports on bridges across general national highways in Korea. As the reports are stored in PDF formats, plain text data were extracted from the PDF files using the pdftotext, which is an open-source text extraction tool (Noonburg 2017).
Sentence segmentation was conducted based on a rule-based algorithm. Assuming that every sentence in the reports strictly follows the language rules and thus ends with a period, this study established rules for sentence segmentation as follows:
1.
Every text string ending with a period was considered as a sentence candidate.
2.
Every string in which the number of characters just before the period was less than two was excluded from the candidates in order to rule out table or figure captions. For example, strings such as “Fig. 3.” or “Table 12.” were excluded using this rule because the number of characters just before the periods was one and two, respectively.
3.
Every string whose length (the number of characters) was less than 10 was excluded in order to rule out errors from raw text extraction tools and captions not excluded by the second rule, such as “Fig. III(a)” (i.e., without a period in the end).
A total of 724,288 sentences were identified by the rule-based algorithm from the raw text. The vocabulary list was then constructed automatically, and each word was embedded to the word vector of 50 dimensions. However, due to the limited time available to label all the words in the reports, this study labeled 1,650 sentences from selected chapters of 350 reports. These sentences were chosen because they described the results of visual bridge inspections. For validation, the data were randomly divided into training data consisting of 1,300 sentences (61,169 words) and testing data consisting of 350 sentences (10,220 words). The bridge damage factor recognition model was trained using the training data only, and then its recognition performance was tested using the test data.
The recognition performance was evaluated using an average F1 score, the harmonic mean of precision and recall. This was due to most of the words being labeled as the ‘others’ class, and therefore a simple accuracy measure would be highly biased. The F1 score for each class was evaluated, and then the average of the F1 scores was used to evaluate the model’s recognition performance [Eq. (12)]
F1=2recall1+precision1=2×precision×recallprecision+recall
(12)
Two models were trained in different learning settings: a batch learning setting (i.e., all training data were used for training) and an active learning setting (i.e., each set of samples from the training data were used for model training). The results from the batch learning model were used as the baseline for evaluating the performance of the active learning model. Assuming the performance of the batch learning model as the maximum possible performance of the bridge damage factor recognition model, the amount of training data that the active learning model utilized was used as a metric of efficiency. The effect of the active learning method was then evaluated to determine how it could reduce the amount of manual labeling in order to achieve a certain level of recognition performance compared with the maximum performance.

Model Recognition Results: Batch Learning

Table 1 presents the model performances when using the batch learning. It was reported that most of the words were correctly classified with an overall accuracy of 92.7%, implying that the developed bi-LSTM model is effective in extracting useful damage information from bridge inspection reports.
Table 1. Performance of bi-LSTM model in the batch learning setting
Actual classPredicted class
ElementDamageCauseOthersSum
Element97910131971,199
Damage163761686494
Cause101869775800
Others18657677,4177,727
Sum1,1914617937,77510,220
In detail, the 10 most frequent keywords assigned to each category are listed in Table 2. The half of the top 10 bridge elements were discovered from structural components (i.e., support, expansion joint, abutment, pier, and girder) and the others were related to operational components (i.e., concrete, drain pipe, floor slab, pavement, and catch drain). For both damages and causes, water-related factors such as water leak, storm water, temperature, crack, surface, and efflorescence were often reported; it may be because bridges are often built over a river or sea. It is also worth mentioning that several structural factors (e.g., drying shrinkage, broken, corrosion, and rebar) were retrieved as bridge damages and causes.
Table 2. Ten most frequent keywords from bridge inspection reports
IndexBridge elementDamageCause
KeywordCountKeywordCountKeywordCount
1Support248Water leak102Crack387
2Concrete217Construction phase95Broken231
3Expansion joint197Drying shrinkage92Corrosion190
4Abutment173Vehicle87Surface110
5Drain pipe136Storm water75Water leak103
6Floor slab134Common69Deterioration98
7Pier133Inflow58Rebar98
8Pavement130Inflow of storm water50Painting97
9Catch drain128Temperature39Efflorescence88
10Girder109Increase38Debris82
Table 3 gives the F1 scores of each class: element, damage, cause, and others. It was determined that the F1 score of the others class was the highest (95.7%), while the remaining classes were ranging from 78.7% to 87.6%.
Table 3. F1 scores of the model for each class
ClassElementDamageCauseOthersAveraged
F1 score0.8190.7870.8760.9570.860
Although the others category accounted for 75.6% of the testing data (i.e., 7,727 words among 10,220), the model was able to successfully classify the word categories with 0.860 F1 score on the average. These results indicate that the model was able to extract informative keywords from a set of text sentences, while filtering out others words which are irrelevant to the cause-effect relationship of bridge damages. The F1 score for the damage class was the lowest (78.7%). The damage classification errors mostly occurred when multiple causes were given to describe single damage; the model had difficulties in understanding the context within connected sentences. For example, in the sentence “surface exfoliation was identified as a result of freezing, calcium chloride application, extended operation time, and the deterioration of concrete curb,” the model misclassified the class of deterioration as damage when it should have been classified as cause of the surface exfoliation. Similarly, in the sentence “the water leakage caused by expansion joint failure resulted in efflorescence on a concrete slab,” the model confused to determine the class of the keyword “leakage,” even if it was better to be identified as cause that caused efflorescence (i.e., damage). Nevertheless, the results of recognizing damage and its causes are still valid, with an F1 score of 78.7%.

Model Recognition Results: Active Learning

Fig. 9 shows the model performances (i.e., F1 score) according to the volume of the training data used. Given 10 input sentences at the first iterations, the F1 score remained in the vicinity of 0.459. However, the model performance rapidly rose to 0.765 at the tenth iteration (i.e., 100 input sentences), and increased to 0.791 at the twentieth iteration (i.e., 200 input sentences). This performance improvement remained as the number of sentences increased, and the F1 score eventually reached 0.816 at the end of the iterations (i.e., 1,300 input sentences).
Fig. 9. Performance of active learning method.
These results strongly suggest that the active learning method can effectively reduce the effort needed to label the text data required to train the model. For example, the model trained by active learning achieved 0.778 F1 score only after using 140 sentences, which required less than an hour for data labeling. It implies that bridge engineers can easily build a promising NER model and extract causal information from abundant inspection reports with little effort. On the other hand, it was observed that increasing the volume of training data did not necessarily increase the recognition performance. However, instead of the worsened performance sometimes, especially when a sentence with incorrectly tokenized words or typographical errors was selected as the query. These errors seldom appeared in the inspection reports, and therefore few samples, including such errors, were used in training. As such, the model had difficulty in correctly classifying these new words. While the performance drops were not critical, the impact of errors on constructing input features in the model performance should be more thoroughly investigated in future research.

Results and Discussion

This section discusses several examples of the results and their potential future applications. As reported in the experiments, the proposed method performed well in classifying word categories from given text sentences. The model assigned the word categories such as bridge element, damage, cause, and others with the F1 score of 0.860. For example, the sentence “Reticular cracks occurred across the foreground, and it might be due to the repeated traffic loading after the new asphalt pavement of the existing crack damage” was correctly parsed. The model assigned bridge element to foreground and new asphalt pavement, damage to reticular cracks and existing crack damage, and cause to traffic and loading. For another example, given the sentence “During the inspection, 13 drainage holes were found to be clogged, which resulted from the extension of its service life,” the model correctly recognized the bridge elements (i.e., drainage holes), damage (i.e., clogged), and its cause (i.e., extension of its service life). Furthermore, the sentence “Due to water leakage, the efflorescence occurred on the bottom surface of the slab, and the bottom flanges of the steel girder were corroded” was also well-recognized: bridge elements → bottom surface of the slab, bottom flanges of the steel girder; damages → efflorescence (of the slab), corrosion (of the flanges); causes → water leakage. Moreover, the model was able to screen out the noninformative words (e.g., during inspection, were found, and due to) into the others category from the example sentences. Despite the encouraging proof of concept, the authors observed a few inevitable errors. For instance, the word deterioration in the sentence “In the case of curb concrete, surface exfoliation occurred in several sections due to deterioration caused by spraying of calcium chloride in winter” was filtered into the damage category (ground truth: cause). As described in the “Bridge Damage Factor Recognition Model” section, distinguishing the two categories may be often confusing since the meanings of the damage and cause words are highly context-dependent. Despite these inevitable errors, the false positives and true negatives were not frequently observed, and the average F1 score was reported as 0.860 in the authors’ experiments (Table 3).
The research team further conducted face-to-face and semistructured interviews to evaluate the practical usefulness of the developed method. A total of nine experts were interviewed in accordance with their organization types (i.e., general contractors, bridge maintenance firms, and research institutes), position and duty, and work experiences (Table 4). All interviewees acknowledged that the results of the NER model (i.e., the causal information extracted from bridge inspection reports) can facilitate field engineers to design and maintain bridge structures, and they further suggested specific application examples. To elaborate, given the result cracks due to heavy traffic loading, bridge engineers can estimate expected traffic loading and decide to adjust the strength of road pavement during design phases. Based on the derived relationship (bridge element: rebar surface and film on passive state metals, damage: break, and cause: sea breeze and concrete carbonation), an engineer of a new coastal bridge project should select proper construction methods and materials considering the effects of water-related issues. They can utilize nonalkali-reactive aggregates and lower the water-cement ratio to suppress the concrete carbonation. The interviewees also explained the usefulness of NER results in the aspect of bridge maintenance. In the case that only particular types of damages (e.g., efflorescence and corrosion) were observed from multiple inspection reports for a certain bridge, field inspectors can put more attention on those damages, rather than others (e.g., cracks). To proactively prevent such repeated damages, bridge managers may consider to supplement existing drainage systems and/or utilize particular materials that are durable to the efflorescence and corrosion. It could be concluded that the developed method is able to extract valuable causal information that can be used for bridge engineering and maintenance. In addition, most of the interviewees considered that the method, requiring less than an hour for data labeling and model development, would be acceptable in practice.
Table 4. Personal information of the consultants
IndexAffiliationOrganization typePosition and dutyWork experiences
1Korea infrastructure safety corporationBridge maintenance firmDeputy department head20
2Daewoo engineering and constructionGeneral contractorDeputy department head20
3g3wayBridge maintenance firmDirector16
4Institute of construction and environmental engineeringResearch instituteResearch assistant professor13
5Korea institute of civil engineering and building technologyResearch instituteChief researcher13
6Korea infrastructure safety corporationBridge maintenance firmSection head10
7Daewoo engineering and constructionGeneral contractorDeputy section head10
8Institute of construction and environmental engineeringResearch instituteSenior researcher6
9Institute of construction and environmental engineeringResearch instituteSenior researcher6

Conclusion

The authors proposed an efficient approach that extracts the damage factors from bridge inspection reports. The NER model was built with the bi-LSTM architecture and was trained using the active learning method. In this research, although the model was trained by the text data written in Korean, it is applicable to other languages if only a few training data are given. To the authors’ knowledge, this study is the first attempt to apply the concept of active learning and extract the cause-effect relationship of bridge damages from the bridge inspection reports automatically. The experimental results showed that the model successfully extracts the damage factors (i.e., bridge element, damage, and cause) from bridge inspection reports with the performance of 0.860 F1 score. Especially, the active learning approach significantly reduced the number of training data and the costs for human labeling, thereby facilitating bridge engineers to build a promising NER model and extract useful causal information with little effort. The extracted causal information can help engineers to understand the complex mechanisms of bridge damages, predict potential damages, and take proper mitigation actions in design phases. The information about the cause-effect relationship would be also beneficial for bridge maintenance. Field inspectors can determine the bridge structure that is exposed to the most serious damages requiring treatment, and thus plan for the maintenance of the bridge before field inspection.
The research still showed improvement opportunities in both theoretical and practical aspects. Theoretically, the developed method still requires a slight human effort to purify input texts, because inspection reports were not free from typographical and linguistic errors. Future studies can focus on developing a construction-customized text preprocessing technique to filter out those human errors. In the practical aspects, it would be meaningful to evaluate the applicability of the developed method to different types of language. It is because the bridge damage factors might be described differently, as lexicons and grammars differ from one language to another. Through the application to bridge inspection reports from different countries (i.e., bridges with different structural and environmental conditions), it would be available to discover more diverse causal relationships. Integrating the information from various sources and extracting an in-depth knowledge about the bridge damage would better contribute to the bridge engineering and maintenance.

Data Availability Statement

Some or all data, models, or code generated or used during the study are available from the corresponding author by request (the code for text preprocessing, word embedding, bi-LSTM, and active learning). Some data used during the study were provided by a third party (the bridge inspection reports). Direct requests for these materials may be made to the provider as indicated in the Acknowledgments.

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (Grant No. 2017R1C1B2009237). The authors also acknowledge the Korea Institute of Civil Engineering and Building Technology for providing the bridge inspection reports analyzed in this study.

References

Chiu, J. P. C., and E. Nichols. 2016. “Named entity recognition with bidirectional LSTM-CNNs.” Preprint, submitted November 26, 2015. https://arxiv.org/abs/1511.08308.
Deng, L., W. Wang, and Y. Yu. 2016. “State-of-the-art review on the causes and mechanisms of bridge collapse.” J. Perform. Constr. Facil. 30 (2): 04015005. https://doi.org/10.1061/(ASCE)CF.1943-5509.0000731.
FHWA (Federal Highway Administration). 2012. Bridge inspector’s reference manual. Washington, DC: FHWA.
Frangopol, D. M., J. S. Kong, and E. S. Gharaibeh. 2001. “Reliability-based life-cycle management of highway bridges.” J. Comput. Civ. Eng. 15 (1): 27–34. https://doi.org/10.1061/(ASCE)0887-3801(2001)15:1(27).
Habibi, M., L. Weber, M. Neves, D. L. Wiegandt, and U. Leser. 2017. “Deep learning with word embeddings improves biomedical named entity recognition.” Bioinformatics 33 (14): i37–i48. https://doi.org/10.1093/bioinformatics/btx228.
Hochreiter, S., and J. Schmidhuber. 1997. “Long short-term memory.” Neural Comput. 9 (8): 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Huang, Z., W. Xu, and K. Yu. 2015. “Bidirectional LSTM-CRF models for sequence tagging.” Preprint, submitted August 9, 2015. https://arxiv.org/abs/1508.01991.
Jallan, Y., E. Brogan, B. Ashuri, and C. M. Clevenger. 2019. “Application of natural language processing and text mining to identify patterns in construction-defect litigation cases.” J. Leg. Aff. Dispute Resolut. Eng. Constr. 11 (4): 04519024. https://doi.org/10.1061/(ASCE)LA.1943-4170.0000308.
Jeon, J. C., I. K. Lee, C. H. Park, and L. H. Hyun. 2017. “A study on improvement of inspection activity based upon condition analysis of expressway bridges.” J. Korean Soc. Civ. Eng. 37 (1): 19–28. https://doi.org/10.12652/Ksce.2017.37.1.0019.
Jin, Z., and K. Tanaka-Ishii. 2006. “Unsupervised segmentation of Chinese text by use of branching entropy.” In Proc., COLING/ACL 2006 Main Conf. Poster Sessions, 428–435. Stroudsburg, PA: Association for Computational Linguistics.
Jurafsky, D., and J. H. Martin. 2014. Speech and language processing. London: Pearson.
Kanda, S., K. Senba, Y. Nanakaya, H. Ikeda, and T. Kawai. 2000. “Database management system for proactive maintenance (case study of steam turbine plant and switch-house equipment in Japan).” In Vol. 1 of Proc., IEEE Power Engineering Society Winter Meeting, 464–469. New York: IEEE.
Kim, H. 2013. Cleansing noisy text using corpus extraction and string match. Seoul: Seoul National Univ.
Kim, H., S. Cho, and P. Kang. 2014. “KR-WordRank: An unsupervised Korean word extraction method based on WordRank.” J. Korean Inst. Ind. Eng. 40 (1): 18–33. https://doi.org/10.7232/JKIIE.2014.40.1.018.
KISTEC (Korea Infrastructure Safety and Technology Corporation). 2017. Specific guidelines for safety inspection and precise safety diagnosis. Seoul: KISTEC.
Lample, G., M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer. 2016. “Neural architectures for named entity recognition.” Preprint, submitted March 4, 2016. https://arxiv.org/abs/1603.01360.
LeBeau, K. H., and S. J. Wadia-Fascetti. 2007. “Fault tree analysis of Schoharie creek bridge collapse.” J. Perform. Constr. Facil. 21 (4): 320–326. https://doi.org/10.1061/(ASCE)0887-3828(2007)21:4(320).
Li, Z., and R. Burgueño. 2010. “Using soft computing to analyze inspection results for bridge evaluation and management.” J. Bridge Eng. 15 (4): 430–438. https://doi.org/10.1061/(ASCE)BE.1943-5592.0000072.
Liu, K., and N. El-Gohary. 2017. “Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports.” Autom. Constr. 81 (Sep): 313–327. https://doi.org/10.1016/j.autcon.2017.02.003.
Lokuge, W., N. Gamage, and S. Setunge. 2016. “Fault tree analysis method for deterioration of timber bridges using an Australian case study.” Built Environ. Project Asset Manage. 6 (3): 332–344. https://doi.org/10.1108/BEPAM-01-2016-0001.
Mayhew, S., and D. Roth. 2018. “TALEN: Tool for annotation of low-resource entities.” In Proc., ACL 2018, System Demonstrations, 80–86. Stroudsburg, PA: Association for Computational Linguistics.
Mikolov, T., K. Chen, G. Corrado, and J. Dean. 2013a. “Efficient estimation of word representations in vector space.” Preprint, submitted January 16, 2013. https://arxiv.org/abs/1301.3781.
Mikolov, T., I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013b. “Distributed representations of words and phrases and their compositionality.” Preprint, submitted October 16, 2013. https://arxiv.org/abs/1310.4546.
Noonburg, D. 2017. “pdftotext.” Accessed September 25, 2019. https://www.xpdfreader.com/download.html.
Peris-Sayol, G., I. Paya-Zaforteza, S. Balasch-Parisi, and J. Alós-Moya. 2017. “Detailed analysis of the causes of bridge fires and their associated damage levels.” J. Perform. Constr. Facil. 31 (3): 04016108. https://doi.org/10.1061/(ASCE)CF.1943-5509.0000977.
Robelin, C.-A., and S. M. Madanat. 2007. “History-dependent bridge deck maintenance and replacement optimization with Markov decision processes.” J. Infrastruct. Syst. 13 (3): 195–201. https://doi.org/10.1061/(ASCE)1076-0342(2007)13:3(195).
Rumelhart, D. E., G. E. Hinton, and R. J. Williams. 1986. “Learning representations by back-propagating errors.” Nature 323 (6088): 533–536. https://doi.org/10.1038/323533a0.
Ryu, J. M., and E. C. Shin. 2014. “Database construction plan of infrastructure safety inspection and in-depth inspection results.” J. Korean Geosynthetics Soc. 13 (4): 133–141. https://doi.org/10.12814/jkgss.2014.13.4.133.
Schuster, M., and K. K. Paliwal. 1997. “Bidirectional recurrent neural networks.” IEEE Trans. Signal Process. 45 (11): 2673–2681. https://doi.org/10.1109/78.650093.
Seedah, D. P. K., and F. Leite. 2015. “Information extraction for freight-related natural language queries.” In Proc., Int. Workshop on Computing in Civil Engineering, 427–435. Reston, VA: ASCE.
Settles, B. 2004. “Biomedical named entity recognition using conditional random fields and rich feature sets.” In Proc., Int. Joint Workshop on Natural Language Processing in Biomedicine and its Applications, 104–107. Stroudsburg, PA: Association for Computational Linguistics.
Settles, B. 2009. “Active learning literature survey.” Mach. Learn. 15 (2): 201–221.
Settles, B., and M. Craven. 2008. “An analysis of active learning strategies for sequence labeling tasks.” In Proc., Conf. on Empirical Methods in Natural Language Processing, 1070–1079. Stroudsburg, PA: Association for Computational Linguistics.
Shannon, C. E. 1948. “A mathematical theory of communication.” Bell Syst. Tech. J. 27 (3): 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.
Spasic, I., S. Ananiadou, J. McNaught, and A. Kumar. 2005. “Text mining and ontologies in biomedicine: Making sense of raw text.” Briefings Bioinf. 6 (3): 239–251. https://doi.org/10.1093/bib/6.3.239.
Tanabe, L., N. Xie, L. H. Thom, W. Matten, and W. J. Wilbur. 2005. “GENETAG: A tagged corpus for gene/protein named entity recognition.” BMC Bioinf. 6 (1): 1–7. https://doi.org/10.1186/1471-2105-6-S1-S3.
Tomanek, K., and U. Hahn. 2009. “Semi-supervised active learning for sequence labeling.” In Proc., 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, 1039–1047. Stroudsburg, PA: Association for Computational Linguistics.
Zhu, F., P. Patumcharoenpol, C. Zhang, Y. Yang, J. Chan, A. Meechai, W. Vongsangnak, and B. Shen. 2013. “Biomedical text mining and its applications in cancer research.” J. Biomed. Inf. 46 (2): 200–211. https://doi.org/10.1016/j.jbi.2012.10.007.

Information & Authors

Information

Published In

Go to Journal of Performance of Constructed Facilities
Journal of Performance of Constructed Facilities
Volume 34Issue 6December 2020

History

Received: Mar 12, 2020
Accepted: Jul 6, 2020
Published online: Sep 28, 2020
Published in print: Dec 1, 2020
Discussion open until: Feb 28, 2021

Authors

Affiliations

Postdoctoral Research Associate, Dept. of Civil and Environmental Engineering, Seoul National Univ., 1 Gwanak-Ro, Gwanak-Gu, Seoul 08826, Republic of Korea; Senior Researcher, Institute of Construction and Environmental Engineering, Seoul National Univ., 1 Gwanak-Ro, Gwanak-Gu, Seoul 08826, Republic of Korea. ORCID: https://orcid.org/0000-0002-4620-5592. Email: [email protected]
Sehwan Chung [email protected]
Ph.D. Student, Tishman Construction Management Program, Dept. of Civil and Environmental Engineering, Univ. of Michigan, 2350 Hayward St., 1318 G.G. Brown, Ann Arbor, MI 48109. Email: [email protected]
Seokho Chi, M.ASCE [email protected]
Professor, Dept. of Civil and Environmental Engineering, Seoul National Univ., 1 Gwanak-Ro, Gwanak-Gu, Seoul 08826, Republic of Korea; Adjunct Professor, Institute of Construction and Environmental Engineering, Seoul National Univ., 1 Gwanak-Ro, Gwanak-Gu, Seoul 08826, Republic of Korea (corresponding author). Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share