Understanding User Experience and Satisfaction with Urban Infrastructure through Text Mining of Civil Complaint Data

Chang, Taeyeon; Chi, Seokho; Im, Seok-Been

doi:10.1061/(ASCE)CO.1943-7862.0002308

Open access

Technical Papers

May 18, 2022

Understanding User Experience and Satisfaction with Urban Infrastructure through Text Mining of Civil Complaint Data

Authors: Taeyeon Chang https://orcid.org/0000-0003-0077-4324 [email protected], Seokho Chi, M.ASCE https://orcid.org/0000-0002-0409-5268 [email protected], and Seok-Been Im [email protected]Author Affiliations

Publication: Journal of Construction Engineering and Management

Volume 148, Issue 8

https://doi.org/10.1061/(ASCE)CO.1943-7862.0002308

PDF

Abstract

With the increase in public concern about the aging of urban infrastructure and the associated risk of safety accidents, it is important to maintain the safety and serviceability of urban infrastructure in accordance with user satisfaction. Although many studies have attempted to consider user experience and satisfaction based on user surveys and civil complaint data analysis, they have had difficulty in identifying user dissatisfaction factors where users feel unsafe or uncomfortable while using the infrastructure. The main purpose of the research presented here is to understand user experience and satisfaction with urban infrastructure by text mining self-written civil complaint data. To achieve this objective, the researchers adopted the following procedures: (1) development of a civil complaint thesaurus for the text mining of civil complaint data; (2) text preprocessing of civil complaint data by using the thesaurus; and (3) keyword extraction and recognition of the relationships between the keywords to explore user-experience factors related to urban infrastructure. The research team used 2,945 bridge complaint data records and 404 tunnel complaint data records in text format from the Korean Safety e-Report database. From the collected data, the researchers developed a civil complaint thesaurus with 47 semantic relationships between words, such as Korean compound words, synonyms, and hypernym– hyponyms. As a result of keyword extraction, “breakage,” “accident,” and “road” for bridge complaints, and “entrance,” “accident,” and “breakage” for tunnel complaints were the selected words representing user experiences, and were visualized in a tag cloud. Also, critical user-experience factors such as unsafe or uncomfortable situations on bridge roads (e.g., “breakage,” “construction,” and “pothole”), and dissatisfaction factors at tunnel entrances (e.g., “streetlight,” “view,” and “sign”) were explored using semantic network analysis. The outcome of this research will contribute to identifying user-experience factors from civil complaint data and improving the safety and serviceability of urban infrastructure by considering user experience and satisfaction in infrastructure maintenance practices.

Introduction

The safety and serviceability of urban infrastructures are essential conditions for the quality of public life and a safe urban system. As urban infrastructures age and fatal safety accidents related to aging infrastructures occur all over the world, public concerns about urban infrastructures are increasing (ASCE 2021; Ellingwood 2010). For example, in Korea, more than 75% of the public in 2018 perceived the safety and service level of the country’s infrastructure as below average (Statistics Korea 2019). To improve the public’s comfort and satisfaction with urban infrastructures, it is important to maintain the condition and service quality of these infrastructures in accordance with user experience and satisfaction. Here, the infrastructure user means the consumer and the beneficiary of urban infrastructure services.

To maintain the safety and serviceability of urban infrastructures, as part of their maintenance practices, managers periodically inspect structural and visual damage to infrastructure according to their guidelines and manuals for inspection and diagnosis, and conduct repairs, reinforcement, and rehabilitation based on the inspection results (Chang and Chi 2019; FHWA 2020; Kobayashi and Kaito 2016; Lim and Chi 2019; MOLIT 2021). The manager’s perspective, however, may not match user experience and satisfaction. In determining the rehabilitation demand for road pavements, for instance, the affordable results of physical pavement roughness measures may not satisfy a user’s perception of the pavement’s roughness; the user may have higher or different expectations (Shafizadeh and Mannering 2006). Similarly, users may, for example, have concerns about or experience discomfort with the lighting or shape of bridge railings, which may cause safety accidents on the bridge (MOIS 2018), but inspection manuals in Korea do not stipulate that managers must check these factors.

Many studies have attempted to incorporate user experiences derived from the results of user surveys into infrastructure maintenance practices (Abdul-Rahman et al. 2015; Gopikrishnan and Paul 2017; Kang and Lee 2012; Shafizadeh and Mannering 2006); however, these studies have considered user satisfaction only for the issues stipulated in a questionnaire that was designed from a manager’s perspective. Other researchers have tried to analyze civil complaint data written by users to explore user experience and dissatisfaction factors for improving service quality and user satisfaction for urban infrastructure, such as a water distribution system, a metro system, and a building indoor environment (Drake and Zechman 2012; Haider et al. 2016; Park et al. 2015; Teng et al. 2018; Villeneuve and O’Brien 2020). Drake and Zechman (2012), for example, attempted to analyze user complaints regarding a water distribution system by exploring the location and time of complaints. Despite these efforts, previous approaches have been limited in mining the contents of the civil complain data to determine which factors cause users to feel unsafe or uncomfortable.

Thus, the primary objective of the research presented in this work is to understand user experience and satisfaction with urban infrastructure by text mining self-written civil complaint data. To accomplish this objective, this study comprised the following elements: (1) developed a civil complaint thesaurus for facilitating the text mining of civil complaint data; (2) preprocessed civil complaint data in text format by using the thesaurus; and (3) extracted keywords and identified the relationship between the keywords to explore user-experience factors in urban infrastructure. The user-experience factors refer to instances where users feel unsafe or uncomfortable while using the infrastructure. The outcome of this research will contribute to identifying user-experience factors from civil complaint data and improving the safety and serviceability of urban infrastructure by considering user experience and satisfaction in infrastructure maintenance practices. The research team used civil complaint data from the Korean Safety e-Report database, which is managed by the Ministry of the Interior and Safety (MOIS) in Korea. Because the public directly reports risk situations in the field through mobile, including urban infrastructure risks, to the Korean Safety e-Report, the complaint data collected from the database can provide invaluable data for this study.

This article is organized into five sections. After the introduction section, the Literature Review section discusses related works to consider user satisfaction with urban infrastructure and a review of text mining approaches in the construction domain. The Research Methodology section introduces the data collection, development of the civil complaint thesaurus, text preprocessing, and exploration of user-experience factors. Results of the text mining are discussed in the Results and Discussion section. Finally, we present conclusions, research contributions, and recommendations for future research.

Literature Review

Related Works Considering User Satisfaction with Urban Infrastructure

To consider user satisfaction with infrastructure maintenance, many studies have been conducted to analyze user opinions on each type of urban infrastructure and apply the results of analyses to improve the safety and serviceability of urban infrastructure. Abdul-Rahman et al. (2015) identified building performance requirements that could improve user satisfaction with building facilities maintenance through user surveys. Shafizadeh and Mannering (2006) collected pavement roughness data on a scale of 1 to 5 via a user survey and developed a user perception model for the roughness from the data in order to take into account users’ perspectives in determining the demand for road pavement rehabilitation in a highway system. A user survey was also used to capture user satisfaction for establishing a bicycle level-of-service model (Kang and Lee 2012). These studies, however, considered user satisfaction only with the main issues designated by the manager.

On the other hand, there have been efforts to analyze civil complaint data to identify user experience and dissatisfaction information to improve the service quality of urban infrastructure. Haider et al. (2016) used user complaint data for a water supply system to identify the distribution problems and seriousness of user complaints, and the risks leading to user dissatisfaction, to enhance the reliability of the system. Teng et al. (2018) also analyzed metro complaint data to summarize civil complaints according to their location, time, and category. In spite of these attempts to consider civil complaint data, it is still difficult to directly identify user dissatisfaction factors that indicate feeling unsafe or uncomfortable while using urban infrastructure. Therefore, this study will improve our understanding of user experience and satisfaction by discovering user-experience factors from the civil complaints data in a self-written text format.

Review of Text Mining Approaches in the Construction Domain

Text mining is defined as the process of extracting meaningful information and contexts from unstructured data in text format (Baker et al. 2020; Manning et al. 2008; Zhang et al. 2019a). It aims to solve problems such as information extraction, text categorization, text summarization, and information retrieval (Miner et al. 2012). For these purposes, many researchers have analyzed large amounts of text data by utilizing the automated techniques of text mining, including keyword extraction, word network analysis, topic modeling, opinion mining, and sentiment analysis, in various domains such as business, health science, and education (He 2013; He et al. 2013; Jung and Lee 2020).

In the construction domain, text mining has also been conducted to extract meaningful information, to classify text, and to discover interesting patterns or trends from construction management documents, accident reports, contractual documents, public opinions or complaints, and others. At the document level, construction project documents were classified based on the key project components of the documents (Caldas and Soibelman 2003) and the clustering results from the documents’ textual similarities (Al Qady and Kandil 2014). Moon et al. (2018) also tried to ascertain the issues of the global construction market using keyword extraction and visualization from text data related to the global construction market.

In particular, some studies have recently attempted to analyze public opinions or complaints related to infrastructure management and governance. Zhong et al. (2019) labeled building quality complaint (BQC) text data according to 12 complaint subjects (e.g., leakage, hollowing or cracking, and construction impact) and developed a convolutional neural network (CNN)–based approach to automatically classify the BQC documents according to these subjects. Villeneuve and O’Brien (2020) conducted text mining of Airbnb reviews to explore indoor environmental quality (IEQ). Seasonal trends and causes of the IEQ-related complaints were discovered using keyword extraction along with term frequencies (TFs), and sentiment analysis was then performed to understand the sentimental characteristics of the complaints. Zhou et al. (2021) analyzed public opinions of infrastructure megaprojects extracted from social media platforms by utilizing topic modeling and sentiment analysis to understand the major topics and perceptions that potential users consider for megaprojects. Similarly, public opinions concerning the metro system were used for topic modeling and sentiment analysis (Zhang et al. 2019b). These studies have succeeded in identifying major topics of construction-domain documents, particularly public issues or sentiments about urban infrastructure that are evident from public opinions or complaints. However, these results could not concretely explain the areas in which users feel unsafe or uncomfortable while using the infrastructure, which are factors that are crucial to urban infrastructure maintenance. Therefore, it is necessary to extract detailed information at the level of sentences and words to explore user-experience factors.

On the other hand, many studies have attempted to analyze construction-domain text for information extraction and text classification at the level of the sentence or word. Zhang and El-Gohary (2013) tried to extract specific requirements from construction regulatory documents by applying information extraction rules made up of syntactic and semantic text features. Tixier et al. (2016) analyzed construction injury reports to extract energy, injury, and body types based on rule-based approaches. In addition, some studies have verified that machine learning algorithms show powerful performance in extracting several predefined kinds of information from construction-domain text (Goh and Ubeynarayana 2017; Hassan and Le 2020; Kim and Chi 2019; Moon et al. 2020, 2021; Salama et al. 2013; Wu et al. 2021). For example, Kim and Chi (2019) identified hazard objects, hazard positions, work processes, and accident results from construction accident cases by using a conditional random field as well as semantic rules. Hassan and Le (2020) tried to automatically identify requirements from construction contract documents using naïve Bayes, support vector machines, logistic regression, and feedforward neural network. Moon et al. (2020) extracted bridge damage factors (i.e., element, damage, and cause) from bridge inspection reports based on a recurrent neural network. These approaches, however, were restricted to a few information categories determined by the researchers; thus, they do not align well with the objective of the present study, which is to explore unexpected user-experience factors (e.g., object, state, time, location, and cause) to understand user experience and satisfaction with urban infrastructure. In addition, the aforementioned studies using machine learning algorithms required a large amount of data for model training, but the target data in this study (i.e., the civil complaints data in a self-written text format) were not enough to fulfill the requirement.

To accomplish the objective of the this study, we collected civil complaint data that described the areas where users felt unsafe or uncomfortable while using infrastructure, and we conducted preprocessing to an appropriate level for analysis. Then, we used simple and intuitive text-mining techniques—keyword extraction based on TF calculation to extract major dissatisfaction factors, and semantic network analysis (SNA), also known as word network analysis, to identify meaningful relationships among the keywords. A detailed explanation of them is presented in the following section.

Research Methodology

Fig. 1 illustrates the proposed research methodology. As shown in the figure, the methodology is organized into four main stages. The first stage of the research was to collect civil complaint data in text format from the Korean Safety e-Report. Second, a civil complaint thesaurus was developed to facilitate text mining of the collected data. Using the thesaurus, the collected text data were preprocessed based on four steps. The detailed text preprocessing implemented in this research is discussed in the following sections. Last, to explore the user-experience factors, keywords were extracted from the preprocessed complaint data by conducting a TF calculation, and the keywords were visualized in the form of a tag cloud. The research team then identified the relationships among the keywords by using SNA, and plotted the relationships into a network graph. The research methodology was developed and implemented using Python version 3.6.8.

Data Collection

To conduct this study, 2,945 bridge complaint data records and 404 tunnel complaint data records were collected from the Korean Safety e-Report database from 2017 to 2018. Bridges and tunnels are representative urban infrastructures used by many people in Korea, as in other countries, and users reported diverse types of civil complaints, including risk situations (e.g., risk of traffic accidents due to potholes in a bridge ramp and the risk of pedestrian accidents from tiles falling off the outer wall of a tunnel), structural factors (e.g., exposed steel of a bridge expansion-joint and a crack in a bridge deck), and uncomfortable factors (e.g., dim lighting inside the tunnel and the need to clean around drainage areas).

As shown in Table 1, the collected civil complaint data are listed in text format for both infrastructure types. A total of 38,860 and 5,544 space-separated words are respectively identified in the 2,945 bridge complaint data and the 404 tunnel complaint data; one complaint data field consists of an average 13.3 space-separated words. More specifically, the civil complaint data include terms representing objects or states of dissatisfaction with urban infrastructures (e.g., “a pothole in the entrance road,” “the exposed steel of the bridge substructure,” “the tiles dropping off,” and “the cracks in the road surface”), among which there are some domain-specific terms related to construction or infrastructure maintenance (e.g., “pothole,” “exposed-steel,” “pier,” and “expansion-joint”). The data also include general terms expressing user feelings such as discomfort and dissatisfaction or requesting maintenance actions (e.g., “dangerous,” “risk,” “dark,” “necessary,” and “action”), although the terms have little relation to the dissatisfaction factors.

Table 1. Examples of the collected civil complaint data

Category	Raw data (English translation)
Bridge complaints	There is a pothole in the entrance road of the bridge, which is dangerous to traffic.
	It is necessary to repair the exposed steel of the bridge substructure.
	The bridge drainage is blocked by soil and dirt.
Tunnel complaints	There is a risk of a pedestrian safety accident as the tiles on the outer wall of the tunnel are dropping off.
	The cracks in the road surface in the tunnel are progressing seriously.
	It is dark and dangerous since the entrance lighting is off at night.

Development of a Civil Complaint Thesaurus

A thesaurus is a dictionary that defines the semantic relationships among terms, including synonyms, hypernyms, and hyponyms. It has been used as an information retrieval method to extend queries and resolve query inconsistencies, and can also be applied in text preprocessing for other text mining techniques, such as keyword analysis and text classification (Bang et al. 2006; Kim and Chi 2019; Xu and Yu 2010). In this research, when users reported their civil complaints via the Korean Safety e-Report, they could use different expressions to represent the same or almost the same meaning, because there is no standard or set format for the reports. For instance, terms could be represented using synonyms such as “expansion-joint or joint” and “pothole or sinkhole,” and some hypernyms–hyponyms such as “heavy-vehicle–dump-truck.” Thus, the thesaurus helps to replace semantically similar terms with a single representative word, reducing the number of words used for analysis.

The civil complaint thesaurus proposed in this study was constructed to define the semantic relationships between the terms in civil complaints by scrutinizing the collected data in detail based on well-structured documents and utilizing expert interviews. In general, synonym, hypernym–hyponym, and abbreviation relationships between common terms were classified by referring to the Standard Korean Language Dictionary, which is distributed by the National Institute of the Korean Language (NIKL) in Korea (NIKL 2020b), and the most-used terms in the collected data were selected as representative words. Some terms that have almost the same meaning in the context of the civil complaint data records (e.g., “pothole,” “hole,” and “dent” in the road) were included as synonyms even if they are not synonymous as defined by this dictionary. In the case of domain-specific terms in the construction sector or infrastructure maintenance field (e.g., “expansion-joint,” “exposed-steel,” and “bearing”), synonym and hypernym–hyponym relationships were distinguished based on the definitions of domain-specific terms in the Korea Construction Standard Glossary (MOLIT 2020) and the Guidelines for Maintenance and Performance Assessments (MOLIT 2019). Also, the original terms for foreign languages written in Korean (e.g., deck 데크/덱, slab 슬래브, and grating 그레이팅) were identified through the Korean loanword orthography (NIKL 2020a) and replaced with representative words with the same meaning. Finally, the constructed thesaurus was verified by experts who have a profound knowledge of infrastructure inspection and maintenance terminology.

Text Preprocessing

The aim of text preprocessing in this study was to ensure that the collected data in text format comprised only meaningful terms for analysis. The main steps included cleaning and normalization, tokenization, part-of-speech (POS) tagging and noun extraction, and stopwords removal based on general text mining procedures (Manning et al. 2008; Moon et al. 2021; Weiss et al. 2005). First, in the data-cleaning step, the researchers eliminated noisy data that did not affect the analysis results, such as punctuation marks (e.g., “!,” “?,” and “-”) and index numbers. After that, terms with the same or similar meaning were normalized to a single representative word in accordance with the previously defined civil complaint thesaurus. For instance, as shown in Fig. 2, in the first step, the sentence “The exposed steel of the bridge joint was caused by dump trucks!” would be replaced with the sentence “The exposed-steel of the bridge expansion-joint was caused by heavy-vehicle.”

Second, the normalized data were tokenized into semantic words to explore user-experience factors. Tokenization involves parsing every sentence into individual terms, called tokens (Moon et al. 2020). Then, in the next step, the research team extracted nouns from the tokenized data, as nouns in Korean commonly represent the critical information concerning users’ experiences and satisfaction (e.g., deck, expansion-joint, breakage, and pothole). For this purpose, every token was tagged with its POS, and only the tokens with a noun tag were extracted. To conduct these two sequential steps, this study utilized the Komoran POS tagger (Shin 2014) provided in the KoNLPy Python package (Park 2014), which is widely used for Korean tokenization. The Komoran POS tagger implements the tokenization and POS tagging process based on a dictionary (i.e., Korean token and POS for each term) predefined by researchers. The research team updated the predefined dictionary in accordance with the developed civil complaint thesaurus. Therefore, the words discussed in the thesaurus could be split into one token. For example, the earlier sentence, “The exposed-steel of the bridge expansion-joint was caused by heavy-vehicle” would be tokenized into 10 words: “the,” “exposed-steel,” “of,” “the,” “bridge,” “expansion-joint,” “was,” “caused,” “by,” and “heavy-vehicle.” Among them, only four nouns (i.e., “exposed-steel,” “bridge,” “expansion-joint,” and “heavy-vehicle”) would be extracted (Fig. 2).

Finally, stopwords removal involves dropping extremely common terms with little analysis value (Zou et al. 2017). A stopwords list is generally organized by sorting the most frequent terms and manually excluding informative terms in the specific text. The stopwords list of the collected civil complaint data included the names of infrastructure types (e.g., “bridge” and “tunnel”) and words less related to dissatisfaction objects or states (e.g., “risk,” “safety,” “action,” and “need”). Taking the previous example, the term “bridge” included in the stopwords list was deleted from the four nouns extracted; that is, after these text preprocessing steps, the original complaint sentence, “The exposed steel of the bridge joint was caused by dump trucks!” was eventually preprocessed into three words: “exposed-steel,” “expansion-joint,” and “heavy-vehicle,” as shown in Fig. 2.

Exploration of User-Experience Factors

Keyword Extraction

Keyword extraction, also known as keyword analysis, is a text mining technique used to identify the most important words and features in text data. It is a basic process to understand text data and plays an essential role in text mining problems, such as information extraction, text categorization, text summarization, and information retrieval (Berry and Kogan 2010). In this research, keywords that represent user-experience factors were extracted from the preprocessed complaint data for each type of infrastructure based on TF calculation. TF, which is one of the traditional methods of keyword extraction, is an indicator of how many times the word appears in the text data. It has been popular for extracting key features because of its simple and intuitive calculation process (Baek et al. 2021; Moon et al. 2018; Villeneuve and O’Brien 2020; Weiss et al. 2005).

To effectively show the results of keyword extraction, the research team then visualized the keywords in a tag cloud. A tag cloud is a common display format that provides an intuitive overview of text data, depicting words arranged in space and varied in size, color, and position based on word frequency, categorization, and significance (Sun et al. 2020). In this research, the keywords with a higher TF were displayed in larger fonts.

Relationship Recognition between Keywords

In the next process, SNA was conducted to recognize the relationships between the extracted keywords. SNA is a technique used to automatically discover and visualize semantic networks based on unstructured data. The semantic network refers to domain-specific knowledge that represents semantic relations between concepts in a network. The concepts are described as the network’s “nodes,” and the relations between concepts are described as the network’s “edges” (Drieger 2013; Lehmann 1992; Richards and Barnett 1993). That is, SNA can identify the information and knowledge in a specific field by exploring the nodes and edges that make up a semantic network. In particular, in the case of text data, which is a type of unstructured data, the text is composed of words, and the knowledge obtained from the text is a network structure formed by the relationships between the words. Under this conception, SNA for text—which is referred to as word network analysis—is utilized to discover the semantic relations between words (i.e., concepts in SNA) and to build a structure of the text network (Popping 2003; Yoo et al. 2019). In a text network, a node corresponds to a word, and an edge indicates the relationship between words (Jung and Lee 2020).

The semantic network in text is commonly based on a co-occurrence relationship between words. Co-occurrence is defined as the simultaneous appearance of words in a sentence, paragraph, or text (Fariña García et al. 2021). In this research, co-occurrence between the

i

th word

(i = 1, \dots, n)

and the

j

th word

(j = 1, \dots, n)

,

C o (w_{i}, w_{j})

(i \neq j)

, is calculated as

C o (w_{i}, w_{j}) = \sum_{k = 1}^{c} [num (w_{i} | d_{k}) \times num (w_{j} | d_{k})]

(1)

where

num (w | d_{k})

= how many times word

w

appears in the

k

th civil complaint data,

d_{k}

;

n

= number of types of words appearing in the entire civil complaint data (i.e., the number of nodes in a network); and

c

= total number of civil complaint data records. We conducted SNA based on co-occurrences between keywords and visualized the structure of the word co-occurrence network to recognize the relationship between the extracted keywords.

To understand the characteristics of the semantic network, density and centrality can be calculated. The density of network

N

,

density (N)

, is defined as the ratio of the number of relations between the nodes (i.e., the number of edges) to the total number of possible relations, as follows:

density (N) = \frac{e (N)}{C (n, 2)}

(2)

where

e (N)

= number of edges in the network

N

; and

n

= number of nodes in a network

N

. Networks with high density allow for active sharing between the nodes and a rapid spread of information through the entire network. Centrality is an indicator that represents the extent to which a node is located at the center of the entire network. Among the types of centrality, degree centrality is the number of nodes to which one node is directly connected, and the node with a large number of other nodes with relations has a high degree of centrality (Jeon and Kim 2020). In this research, the degree centrality of the

i

th node (i.e., the

i

th word),

D C ({node}_{i})

, and the weighted-degree centrality of the

i

th node,

W D C ({node}_{i})

, are calculated as

D C ({node}_{i}) = \frac{e ({node}_{i})}{e (N)}

(3)

W D C ({node}_{i}) = \frac{\sum_{j = 1}^{n} C o (w_{i}, w_{j})}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} C o (w_{i}, w_{j})}

(4)

where

e ({node}_{i})

means the number of edges directly connected to the

i

th node. We identified the coherence of networks by calculating the network density for each infrastructure type (i.e., bridge and tunnel). Also, the influential keywords with a high co-occurrence between other words were identified by extracting central words with a high degree of centrality. The user-experience factors could be inferred from the word relationships connected to the central words.

Results and Discussion

Results of Civil Complaint Thesaurus and Text Preprocessing

As a result of developing the civil complaint thesaurus, 47 semantic relationships between words were defined. In the text preprocessing steps, among all the bridge and tunnel complaint records analyzed, the thesaurus was applied to 367 bridge and 85 tunnel complaint data records, resulting in 75 types of words being replaced with 37 types of words. Table 2 shows examples of the semantic relationships in the thesaurus, divided into four cases.

Table 2. Examples of semantic relationships in the civil complaint thesaurus

Category	Examples of semantic relationships
Category	Word (English translation)	Representative word (English translation)
Korean compound(19)	“bicycle road”	Bicycle-road
Korean compound(19)	“cross walk”	Cross-walk
Synonym (20)	“pothole,”“sinkhole,”“hole,”and “dent”	Pothole
Synonym (20)	“sign,”“signboard,”“electronic sign,” and “direction board”	Sign
Foreign language written in Korean (5)	“dekeu” and “seullaebeu” (which are the phonetic representations of “deck” and “slab” in Korean)	Deck
Hypernym – hyponym (3)	“drainage,”“drainage pipe,” and “drainage channel”	Drainage-facility
Hypernym – hyponym (3)	“dump truck” and“ready-mixed concrete truck”	Heavy-vehicle

First, in the case of a Korean compound word in which two or more words were combined, any spaces between words were removed. For instance, “bicycle road 자전거 도로” was converted to “bicycle-road 자전거도로,” and “cross walk 횡단 보도” was converted to “cross-walk 횡단보도.” Second, synonyms were replaced with one representative word. For example, users selected words such as “pothole 포트홀,” “sinkhole 싱크홀/씽크홀,” “hole 구멍,” and “dent 패임/파임” to represent grooves in the road. These synonyms were replaced with “pothole 포트홀” as a representative word. Third, foreign languages written in Korean were converted into Korean words with the same meaning. Last, in the case of hypernym–hyponym relationships, hyponyms were replaced with one hypernym with a general meaning that covers the meaning of the hyponyms. For example, “drainage 배수구,” “drainage pipe 배수관,” and “drainage channel 배수로” were replaced with “drainage-facility 배수시설,” and “dump truck 덤프트럭” and “ready-mixed concrete truck 레미콘차량” were replaced with “heavy-vehicle 중차량.”

In consideration of the constructed civil complaint thesaurus, text preprocessing was performed on the 2,945 bridge complaint data and the 404 tunnel complaint data in text format. Table 3 presents the preprocessing results of the example data shown in Table 1. Because the general terms (i.e., nouns) in the stopwords list (e.g., “bridge,” “risk,” “tunnel,” and “safety”) were excluded in the stopwords removal stage, most of the preprocessed data consist of meaningful terms related to dissatisfaction factors.

Table 3. Examples of the text preprocessing results

Category	Raw data (English translation)	Preprocessed data (English translation)
Bridge complaints	There is a pothole in the entrance road of the bridge, which is dangerous to traffic.	“pothole,” “entrance,” “road,” and “traffic”
	It is necessary to repair the exposed steel of the bridge substructure.	“exposed-steel” and “substructure”
	The bridge drainage is blocked by soil and dirt.	“drainage-facility,” “soil,” and “dirt”
Tunnel complaints	There is a risk of a pedestrian safety accident as the tiles on the outer wall of the tunnel are dropping off.	“pedestrian,” “accident,” “tile,” and “outer-wall”
	The cracks in the road surface in the tunnel are progressing seriously.	“crack” and “road-surface”
	It is dark and dangerous since the entrance lighting is off at night.	“entrance,” “lighting,” and “night”

Results of User-Experience Factors

Findings on Keywords

Table 4 lists the top 30 keywords extracted from 2,945 preprocessed bridge complaint data records. The word “breakage” was the keyword with the highest TF, followed by “accident,” “road,” “deck,” and “railing.” From the results of keyword extraction, our research team also discovered that bridge users felt unsafe and uncomfortable with objects such as roads, decks, railings, streetlights, piers, signs, drainage-facilities, bicycle-roads, and expansion-joints, and the states of these objects, with words such as “breakage,” “pothole,” “neglect,” “breakdown,” and “damage.” Fig. 3 illustrates the tag clouds in which the top 30 keywords extracted from the bridge complaint data are visualized.

Table 4. Top 30 keywords in the bridge complaint (English translation)

Word	TF value	Word	TF value	Word	TF value
Breakage	814	Bicycle	166	Entry	89
Accident	444	Walking	157	Breakdown	87
Road	430	Pedestrian	156	Stairs	86
Deck	421	Tree	151	Damage	82
Railing	321	Sign	135	Road-surface	78
Streetlight	246	Construction	135	River	73
Sidewalk	225	Drainage-facility	129	Park	73
Pass	203	Water	123	Lane	72

Fig. 3. Tag clouds with top 30 keywords extracted from the bridge complaint data.

Table 5 shows the top 30 keywords extracted from 404 preprocessed tunnel complaint data records. The keyword with the highest TF was “entrance,” followed by “accident,” “breakage,” “road,” and “lighting.” The keywords “breakage,” “accident,” and “road,” in particular, were extracted from the civil complaint data for both bridges and tunnels, so these keywords appear to heavily reflect user-experience factors in major road infrastructures. From the results of keyword extraction in the tunnel complaint data, our research team also discovered that tunnel users felt unsafe and uncomfortable with objects such as lighting, signs, streetlights, lanes, railings, and drainage-facilities, and the state of these objects, such as breakages, potholes, and cracks. In addition, we found that tunnel users expressed unsafety and discomfort regarding specific times and places, such as tunnel entrances, tunnel exits, entry times, nighttime, and construction times. Fig. 4 shows the tag clouds, including the top 30 keywords extracted from the tunnel complaint data.

Table 5. Top 30 keywords in the tunnel complaint (English translation)

Word	TF value	Word	TF value	Word	TF value
Entrance	75	Pedestrian	22	Crack	10
Accident	67	Lane	20	Traffic-accident	10
Breakage	62	Exit	17	Vine	9
Road	57	Construction	16	Block	9
Lighting	39	Night	15	Wind	9
Entry	34	Railing	14	Median-strip	9
Sign	32	Drainage-facility	13	Tree	9
Sidewalk	31	Water	13	Bar	9
Pothole	27	Sight	12	Driver	9
Streetlight	22	Bicycle	11	Bicycle-road	8

Fig. 4. Tag clouds with top 30 keywords extracted from the tunnel complaint data.

Results of Relationship Recognition between Keywords

To identify the relationships between the extracted keywords, word networks were determined based on the top 50 word relationships with high co-occurrence by using SNA for the civil complaint data. As a result, the word network for bridge complaints had 23 nodes, 50 edges, a density of 0.198, and a total co-occurrence of 5,666, and the word network for tunnel complaints had 32 nodes, 50 edges, a density of 0.101, and total co-occurrence of 728 (Table 6). Specifically, Table 7 shows examples of word relationships derived from the word networks for civil complaints. In the case of the bridge complaints, “deck + breakage” had the highest co-occurrence (211 co-occurrences), indicating that two keywords, “deck” and “breakage,” appeared simultaneously in 211 bridge complaint data records. Subsequently, “railing + breakage” (129 co-occurrences), “accident + breakage” (111 co-occurrences), and “road + accident” (97 co-occurrences) all had high co-occurrence. The tunnel complaint word relationships with high co-occurrence included “entrance + breakage” (18 co-occurrences), “road + breakage” (18 co-occurrences), “entrance + block” (16 co-occurrences), and “accident + entrance” (15 co-occurrences).

Table 6. Node, edge, density and co-occurrence results for the bridge and tunnel complaints word networks

Category	No. of nodes	No. of edges	Density	Total number of co-occurrence
Bridge complaints	23	50	0.198	5,666
Tunnel complaints	32	50	0.101	728

Table 7. Examples of word relationships derived from word networks for the bridge and tunnel complaints

Category	Word relationship	Co-occurrence
Bridge complaints	Deck + breakage	211
	Railing + breakage	129
	Accident + breakage	111
	Road + accident	97
	Road + breakage	94
	Streetlight + breakdown	90
	Tree + deck	76
	Walking + accident	67
	Bicycle + pass	67
	Accident + sidewalk	64
Tunnel complaints	Entrance + breakage	18
	Road + breakage	18
	Entrance + block	16
	Accident + entry	15
	Accident + entrance	10
	Road + accident	10
	Road + pothole	10
	Reflector + accident	9
	Road + lane	9
	Entry + block	9

Degree centrality was then calculated to identify the central words in the word networks. Table 8 lists the top five words with high degree centrality and weighted-degree centrality in the word networks for bridge and tunnel complaints. The degree centrality values of “breakage” and “accident” in the word network for bridge complaints were both 0.727, but the weighted-degree centrality value of “breakage” was 0.191, which was relatively higher than that of “accident” at 0.160. Subsequently, “road” appeared as a central word with a degree centrality value of 0.455 and a weighted-degree centrality value of 0.098. Also, the influential central words in the word network for tunnel complaints were “accident” and “entrance,” with degree centrality values of 0.452 and 0.419, and weighted-degree centrality values of 0.150 and 0.137 in order.

Table 8. Top five words with high centrality in the word networks for bridge and tunnel complaints

Category	Word	DC	Word	WDC
Bridge complaints	Accident	0.727	Breakage	0.191
	Breakage	0.727	Accident	0.160
	Road	0.455	Road	0.098
	Sidewalk	0.273	Deck	0.072
	Bicycle	0.273	Bicycle	0.057
Tunnel complaints	Accident	0.452	Accident	0.150
	Entrance	0.419	Entrance	0.137
	Entry	0.258	Road	0.092
	Road	0.226	Entry	0.084
	Breakage	0.194	Breakage	0.081

Note: DC = the value of degree centrality; and WDC = the value of weighted-degree centrality

The word networks for bridge and tunnel complaints were then visualized on 2D maps, as shown in Fig. 5. Fig. 5(a) illustrates the word network for bridge complaints, with keywords such as “breakage,” “accident,” and “road” in the center of the map. From the words connected to the central word, “breakage,” the research team found that bridge users experience the “breakage” of objects such as decks, railings, roads, piers, road-surface, expansion-joints, and streetlights. Another central word, “road,” had a relationship with “accident,” “breakage,” “construction,” and “pothole,” which referred to unsafe and uncomfortable situations experienced on bridge roads. These relationships between keywords were highlighted in the bridge complaint data, such as “railing breakage on roads with heavy traffic,” “uncontrolled roads despite pavement construction,” and “risk of accidents due to potholes on the roads.”

Fig. 5. Word network maps: (a) word network for bridge complaints; and (b) word network for tunnel complaints.

In the word network for tunnel complaints [Fig. 5(b)], the relationships between central keywords such as “accident” and “entrance” are expressed. The central word, “accident,” had a relationship with “entry,” “entrance,” “reflector,” “light,” “lane,” “sign,” and “nighttime,” and these words appear to be the causes of accidents that were of concern to tunnel users. For instance, “the dim light,” “signs obscured by trees at the entrance,” “unclear lane markings,” and “non-installation of reflectors” were reported as tunnel complaints about accident risk factors. Another central word, “entrance,” was found to be linked to words such as “streetlight,” “view,” “sign,” “vine,” and “mark,” so we confirmed unsafe and uncomfortable factors associated with the tunnel entrance. These factors were supported by tunnel complaint data such as “short field of view because of streetlights broken at the entrance,” “obstruction of signs due to vines,” and “obscurity of entrance mark.”

Verification of Practical Applicability

To verify the practical applicability of the user-experience factors derived from the civil complaint data, the factors were compared with the evaluation factors that should be checked by managers as specified in the inspection manuals for urban infrastructures. This study utilized the Guidelines for Maintenance and Performance Assessments, which is distributed by the Ministry of Land, Infrastructure and Transport (MOLIT) in Korea (MOLIT 2019). The purpose of the guidelines is to quickly and accurately evaluate infrastructure condition and deterioration of performance by presenting the details of the evaluation method and procedure stipulated in the Special Act on the Safety Control and Maintenance of Establishments (MOLIT 2021). The guidelines stipulate that managers should periodically evaluate not only the visual and structural condition of the infrastructure, but also user safety and satisfaction by calculating quantitative indicators or qualitative site inspection methods. Our research team conducted the verification process via in-depth discussions with industrial experts. The experts consisted of three industry practitioners with an average of 17 years’ experience in the areas of bridge and tunnel inspection, infrastructure operation and maintenance (O&M), and text analytics.

Consequently, it was confirmed that the user-experience factors derived from the civil complaint data were similar in context to the factors stipulated in the guidelines that managers should check. For example, bridge users expressed unsafety and discomfort with breakages in the road surface and road potholes. These factors are related to pavement—one of the major types of bridge elements—and managers should inspect pavements by examining potholes, faulting and rutting, and measuring the road pavement condition index (PCI). Bridge users also felt unsafe and uncomfortable with structural factors including deck breakages, exposed steels, expansion-joint breakages, the breakage and leakage of drainage-facilities, and pier breakages. Such structural damage of critical bridge elements (i.e., deck, expansion-joint, drainage-facility, and pier) is periodically checked by managers. Additionally, tunnel users expressed dissatisfaction with streetlights at nighttime, the risk of accidents due to lighting, and streetlights at the tunnel entrance. These factors are related to lighting—one of the major types of tunnel elements—and managers should evaluate lighting performance by measuring the level of illumination and lighting inside the tunnel.

Some user-experience factors were not specified in the guidelines but were recognized by the experts as being worth noting. These are recognized as new findings. For instance, the guidelines stipulate that managers should conduct visual inspection for any damage to bridge railings, such as corrosion, breakages, and elimination; however, bridge users not only expressed discomfort with these visual damages of railings, but also reported concerns about safety accidents on the bridge due to uninstalled railings, low-height railings, and improper railing shapes and alignments against traffic flow. As another example, the guidelines require managers only to check the existence of tunnel signs, but tunnel users showed dissatisfaction also with the operational conditions of sign lightings, signs obscured by trees, and aging signs, which may cause traffic accidents at tunnel entrances. Additionally, the guidelines do not stipulate the maintenance of bicycle-roads on bridges, but users expressed anxiety and discomfort with riding a bicycle on a bridge.

In summary, our research findings were confirmed to be in line with existing infrastructure maintenance practices, and this study identified user-experience factors that are overlooked in practice. In detail, it was necessary to check the safety and serviceability of infrastructure by taking account of usage-environment such as traffic flow and accident risk, as well as the visual and structural condition of the infrastructure. It was also confirmed that dissatisfaction objects were discovered, which are overlooked in practice, as the ways of using infrastructure were diversified. The experts involved in this research agreed on the applicability of its results to infrastructure maintenance practice.

Conclusions

This study aimed to understand user experience and satisfaction with urban infrastructure by applying text mining techniques to civil complaint data. The researchers first collected bridge and tunnel complaint data in text format. By scrutinizing the data in detail, they developed a civil complaint thesaurus with 47 semantic relationships between words, such as Korean compound words, synonyms, and hypernym–hyponyms. After the text preprocessing stage using the thesaurus, to explore user-experience factors, keywords for bridge and tunnel complaints were extracted based on the TF calculation (e.g., “breakage,” “accident,” and “road” for bridge complaints and “entrance,” “accident,” and “breakage” for tunnel complaints). The relationships between the keywords were then identified by utilizing SNA. As a result of the word networks, the objects of “breakage” on a bridge (e.g., “deck,” “railing,” and “pier”), unsafe or uncomfortable situations on the bridge road (e.g., “breakage,” “construction,” and “pothole”), and dissatisfaction factors at the tunnel “entrance” (e.g., “streetlight,” “view,” and “sign”) were explored. The practical applicability of these factors was verified through comparison with managers’ inspection guidelines for urban infrastructures.

This research offers several contributions to urban infrastructure maintenance. First, the civil complaint thesaurus can be used to improve model performance in further text mining studies using civil complaint data, and the 47 semantic relationships between words developed in the research can form a basis for the construction of a thesaurus specialized for infrastructure maintenance. Second, the results of this research contribute to identifying user-experience factors from civil complaint data and improving the safety and serviceability of urban infrastructure by considering user experience and satisfaction in infrastructure maintenance practices. Specifically, as demonstrated in the discussion section, some of the derived user-experience factors were consistent with those in the existing inspection manuals used in practice, and others could be incorporated into the manuals so that managers can inspect users’ satisfaction periodically and thoroughly. Furthermore, if civil complaint data are analyzed in real time, causes of safety accidents or maintenance requirements that may be overlooked by managers or that appear between inspection cycles can be discovered and remedied before accidents occur. Finally, the research findings propose a new paradigm for the infrastructure maintenance domain. That is, existing practices have focused on infrastructure condition and deterioration of performance from the manager’s point of view; the new paradigm proposed by this research focuses on the experience and satisfaction of users, who are the most crucial stakeholders in the life cycle of urban infrastructure.

Further opportunities exist for improvements to enhance the analysis. These improvements could include verifying the proposed research methodology by applying these approaches to civil complaint data of diverse types of infrastructures except bridges and tunnels and the data extracted from various online platforms. In addition, it may be possible to gather more information to help understand user experience and satisfaction by utilizing various text mining techniques—for example, topic modeling could be used to automatically derive major topics of civil complaints from a large volume of complaint data, and user-experience factors could be identified for each derived topic. Users’ perceptions of each type of infrastructure could also be compared by applying sentiment analysis to civil complaint data for various types of infrastructures.

Data Availability Statement

Some data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (MSIT) (No. 2021R1A2C2003696) and the BK21 PLUS research program of the NRF.

References

Abdul-Rahman, H., C. Wang, S. N. Kamaruzzaman, F. A. Mohd-Rahim, M. S. Mohd-Danuri, and K. Lee. 2015. “Case study of facility performance and user requirements in the University of Malaya research and development building.” J. Perform. Constr. Facil. 29 (5): 04014131. https://doi.org/10.1061/(ASCE)CF.1943-5509.0000629.

Abstract

Introduction

Literature Review

Related Works Considering User Satisfaction with Urban Infrastructure

Review of Text Mining Approaches in the Construction Domain

Research Methodology

Data Collection

Development of a Civil Complaint Thesaurus

Text Preprocessing

Exploration of User-Experience Factors

Keyword Extraction

Relationship Recognition between Keywords

Results and Discussion

Results of Civil Complaint Thesaurus and Text Preprocessing

Results of User-Experience Factors

Findings on Keywords

Results of Relationship Recognition between Keywords

Verification of Practical Applicability

Conclusions

Data Availability Statement

Acknowledgments

References

Information

Published In

Copyright

History

Authors

Affiliations

Metrics

Citations

Download citation

Cited by

Figures

Other

Share

Copy the content Link

Share with email

Share

Request Username

Create a new account

Change Password

Password Changed Successfully

Verify Phone

Congrats!