Metaresearching Structural Engineering Using Text Mining: Trend Identifications and Knowledge Gap Discoveries

Ezzeldin, Mohamed; El-Dakhakhni, Wael

doi:10.1061/(ASCE)ST.1943-541X.0002523

Open access

Technical Papers

Feb 28, 2020

Metaresearching Structural Engineering Using Text Mining: Trend Identifications and Knowledge Gap Discoveries

Authors: Mohamed Ezzeldin, A.M.ASCE [email protected], and Wael El-Dakhakhni, F.ASCE https://orcid.org/0000-0001-8617-261X [email protected]Author Affiliations

Publication: Journal of Structural Engineering

Volume 146, Issue 5

https://doi.org/10.1061/(ASCE)ST.1943-541X.0002523

PDF

Abstract

The significant increase in the number of journal paper submissions/publications in the last decades has been paralleled by a shift to (mainly) on-line publication and digital archiving of past research articles. This situation has created an opportunity to metaresearch (conduct research on research) structural engineering through benefiting from emerging computational techniques such as data mining to track historical and current research focuses and trends and to better identify evolving research themes and discover possible cross-cutting knowledge gaps. Such metaresearch can benefit all structural engineering community stakeholders (e.g., researchers, designers, and funding agencies) in multiple ways including research resource realignments and optimizations to meet current and future research needs. The current study utilizes text mining—a class of data mining—to analyze published structural engineering research over 26 years. The considered dataset represents more than 11,000 articles, published in the two leading structural engineering journals (Journal of Structural Engineering and Engineering Structures) from 1991 to 2016. Following the collection and preparation of the training and testing datasets, the latent Dirichlet allocation (LDA) topic modeling technique is utilized to identify, classify, and categorize articles in terms of their topics, characterized by relevant technical terms. Subsequently, quantitative analyses are used to evaluate the temporal inclusion trends within the 11,000 article dataset. The LDA technique is also reapplied on only articles published between 2012 and 2016, to identify recent research topic developments and investigate the correlation between these topics and their counterparts covering the entire 26-year study period. Finally a word co-occurrence network and a topic interlinkage matrix are also developed, providing visual tools to rapidly evaluate structural engineering research subfield co-occurrences and linkage strengths. The overarching aim of this metaresearch is to identify understudied intersections of structural engineering subfields and highlight Blue Ocean opportunities at the interfaces of structural engineering and other established fields and emerging technologies.

Introduction

The advances in the field of digital text archiving coupled with the massive increase of cloud storage capacities and accessibility have created useful research data resources through the availability of a large number of digital textual documents (Feldman and Dagan 1995). Most research publications generated for the past two decades in different disciplines (e.g., biology, computer science, chemistry, engineering, medicine and health) are currently stored in digitally accessible formats that include an extensive number of structured data fields (e.g., author names, publication dates, and titles) combined with unstructured text components (e.g., abstract, introduction, and methodology). Therefore, it has not been practically possible to conduct effective metaresearch (research on research) studies to analyze these documents and extract relevant aggregate information. However, several quantitative analysis techniques (e.g., bibliometrics, information systems, information science, science of science policy, and data mining) have emerged to organize and analyze massive numbers of textual documents (Salloum et al. 2018). One common example of these quantitative techniques is scientometrics (e.g., Maheswaran et al. 2009; Serenko et al. 2010; Pollack and Adler 2015; Hosseini et al. 2018). Yet, although scientometrics focuses on quantifying the impact of articles and authors based on their corresponding citation data, it does not provide any topic-related information, which is key to facilitate a better understanding of the publication relevance/context within a discipline. As such, over the past few years, text mining (TM) has been utilized in different fields to discover key scientific topics (e.g., Nassirtoussi et al. 2014; Ordenes et al. 2014; Lazard et al. 2015; Krallinger et al. 2008).

TM, also known as intelligent text analysis or knowledge-discovery in text, is a class of data mining (DM) that focuses on discovering unknown patterns of interest or extracting knowledge from large textual datasets (Gupta and Lehal 2009). Although TM and DM utilize similar approaches to mine text, such as topic detection and tracking, keyword extraction, sentiment analysis, document clustering, and automatic document summarization (Sukanya and Biruntha 2012; Akilan 2015), each mining technique differs in terms of the structure of the data analyzed. More specifically, DM is used for structured data derived from databases (Gupta and Lehal 2009), while TM is used for unstructured or semi-structured data such as emails, full-text documents, HTML files, and others (Salloum et al. 2018). As such, TM involves an additional complexity in terms of preprocessing the underlying data prior to analysis (Zhang et al. 2015), as will be discussed later in the current study.

In terms of textual data, articles’ full-text contains introductory statements, repeating established knowledge in the field, and may contain more speculative statements in their discussions (Fleuren and Alkema 2015). For these reasons, extracting research themes from articles’ full text is challenging; therefore, several metaresearch studies in different fields have used article abstracts as a compact representation of the text within respective articles (Griffiths and Steyvers 2004; Gatti et al. 2015). As a text field, an article’s abstract summarizes the related topic information of the entire article through a prescribed sequence that typically includes the following: (1) the overall objective of the study and the research problem(s); (2) the methodology of the study (e.g., experimental or analytical); (3) major findings or trends found as a result of the study; and (4) a concise summary of interpretations and conclusions. As such, to a large extent, abstracts are comprised of short sentences—very succinct text to be analyzed—thus providing meaningful, non-trivial, and concise conclusions for relevant TM studies. For example, the article abstracts in the Proceedings of the National Academy of Science from 1991 to 2001 were investigated by Griffiths and Steyvers (2004). The same authors utilized TM to compare research topics comprising these abstracts with existing categories. Blei and Lafferty (2006) also applied TM on the historical literature from the journal Science from 1880 to 2000 to investigate how individual research topics change over time. More recently, Gatti et al. (2015) applied TM to the article metadata of abstracts from 20 journals in the field of operations research and management science and subsequently quantified the generality and specificity of these journals.

Objectives and Scope

Within the field of structural engineering, the vast developments in analytical modeling approaches and experimental testing techniques over the last few decades have led to significantly more complex and diverse research studies. These studies cover a wide spectrum of topics ranging from classical topics such as concrete material behavior (e.g., Schnobrich 1991; Al-Harthy and Frangopol 1994) to advanced technologies such as health monitoring (e.g., Sohn et al. 2000; Li et al. 2004) and, more recently, have covered societal-focused topics such as resilience and sustainability (e.g., Bonstrom and Corotis 2014; Cimellaro et al. 2016; Lounis and McAllister 2016; Salem et al. 2017). This resulted in an exploding number of relevant organized technical conferences and research articles. Consequently, more than ever before, there is a pressing need for new effective quantitative techniques to understand where the structural engineering research community has been heading and how far the community is from identifying and closing key knowledge gaps. To the best of the authors’ knowledge, no metaresearch studies have been reported on structural engineering to date using TM in contrast to the typical state-of-the-art articles that are based on qualitative techniques (e.g., Spencer and Nagarajaiah 2003; Zhao and Zhang 2007; El-Dakhakhni and Ashour 2017; Bruneau et al. 2017).

The main objective of the current study is to identify, trace, and quantify topic trends and discover knowledge gaps in structural engineering using a dataset of articles between 1991 and 2016. The study also aims at providing data-driven visual decision support tools to the structural engineering community stakeholders including researchers, journal editors, conference organizers, publishers, designers/regulators, and funding agencies. In the current study, the text data extracted from the abstracts of 11,027 published articles in the two leading structural engineering journals (Journal of Structural Engineering and Engineering Structures) from 1991 to 2016 are first collected and preprocessed for analysis. By following the latent Dirichlet allocation (LDA) topic modeling technique (Blei et al. 2003), these article abstracts are subsequently textually analyzed to identify key research topics and their corresponding intensities and to provide a visual representation of the structural engineering research topic landscape. Based on this analysis, the current study then evaluates the temporal variation (emergence, increase, decrease, and disappearance) of research topics over the full 26-year period considered compared to that during only the latest 5 years within the same period. A word co-occurrence network and a topic interlinkage matrix are developed, providing data visualization tools to evaluate the co-occurrence of and the linkage strength between different structural engineering topics. Finally, the current study identifies existing structural engineering knowledge/research gaps/intensities at the intersection between different topics in an effort to guide future strategic research directions and resource allocation and minimize research waste (Chalmers and Glasziou 2009).

The current study is quite timely because an excerpt from the introduction of the recent Journal of Structural Engineering 60th Anniversary State-of-the-Art Special Collection states the following: “The introduction of a plethora of new technologies, including data mining, sensing, construction robots, autonomous drones, artificial intelligence, virtual and mixed reality visualization, and three-dimensional (3D) printing, all have or will soon have applications in structural engineering. These technologies, as applied to our field, have already impacted people across the globe and will continue to disrupt the industries associated with structural engineering at an increasing pace” (El-Tawil 2018).

As such, the bird’s eye view presented in the current study highlights the structural engineering research discourse for the past 26 years and identifies areas that remain relatively unexplored at the intersections of different structural engineering subfields. More importantly, the study calls for thinking of disruptive technologies interfaces with structural engineering as Blue Ocean opportunities—creating value in exploring uncharted waters (Kim and Mauborgne 2014)—to drive structural engineering research in the 21st century.

Topic Modeling

TM analyzes a massive amount of textual data or documents to extract and identify their key underlying topics. When each document text is presented as a row in the dataset matrix, it is common to have tens of thousands of dimensions based on the corresponding document words (Hofmann and Chisholm 2016). If left unprocessed, such large matrix dimensions would require a significant level of computational effort and might result in trivial results. To address such issues, topic modeling techniques are typically utilized in different methods to (Blei et al. 2003): (1) identify similarities across all documents; (2) understand the essential characteristics; and (3) discover the unknown key topics (Steyvers and Griffiths 2007).

It is important to understand that topic modeling is a general statistical analysis technique that was originally developed for text-based applications. However, topic modeling has been extending to other applications including images (Bart et al. 2011; Li et al. 2010), music (Blei 2012; Hu et al. 2014), video (Hospedales et al. 2009; Asuncion et al. 2010), and social networks (Tang et al. 2008; Hong and Davison 2010). There are several topic modeling techniques available in the literature, but the most common and effective technique is known as the latent Dirichlet allocation (LDA) model (Steyvers and Griffiths 2007; Gatti et al. 2015; Hofmann and Chisholm 2016), which was developed by Blei et al. (2003), as discussed in the next section.

Latent Dirichlet Allocation

The LDA topic model is a generative probabilistic model that focuses on identifying key topics from a collection of textual documents (Blei et al. 2003). Several previous studies (e.g., Steyvers and Griffiths 2007; Amado et al. 2018) demonstrated how this model can be used to examine the content of scientific text documents. This is because LDA is capable of significantly reducing the number of dimensions (i.e., words) in a document while still retaining the essential relationships between all the dimensions and their key topics in the underlying documents (Blei et al. 2003). As such, the LDA topic model was extensively applied to address a wide range of problems (e.g., Amado et al. 2018). LDA was originally developed based on the classical probabilistic latent semantic analysis model introduced by Hofmann (1999). The basic concept behind this model is that, for any collection of textual documents, each document contains a large number of words. Based on the probability of co-occurrence of different words within the same document, considering all documents, different topics (each made up of different sub-sets of words) can be identified. The words comprising these topics are weighted according to their statistical distribution within each topic (Blei et al. 2003). As such, each document can be represented by the combination of (latent) topics. Such representation significantly reduces the complexity of handling a massive number of textual documents, thus facilitating discovering key topics through their latent inherent statistical relationships (Hofmann and Chisholm 2016).

The LDA model initially defines a number of topics,

K

, where each topic,

k

, contains a distribution of words,

ψ_{k}

, that is evaluated based on a Dirichlet distribution (

β

). Based on these topics, each document,

d

, is analyzed by sampling a document topic distribution,

θ_{d}

, over

K

topics, where

θ_{d}

is quantified from another Dirichlet distribution (

α

) to assign a topic for each word,

w_{d i}

, in

d

. For each

w_{d i}

within each

d

, LDA first assigns a particular topic,

Z_{d i}

, that belongs to

K

based on the multinomial distribution (

θ_{d}

), and subsequently,

w_{d i}

is selected from multinomial distribution

ψ_{Z d i}

. The full details of Dirichlet and multinomial distributions can be found in Minka (2000) and Correa (2001), respectively.

LDA utilizes several inference algorithms to evaluate the document topic distribution,

θ

, and word distribution,

ψ

. One of these algorithms is the Gibbs sampling algorithm that was originally introduced by Geman and Geman (1984) and utilized to discover topics by Griffiths and Steyvers (2004). This algorithm assigns a value to each word in the model,

w_{d i}

, and estimates the probability of assigning

w_{d i}

to each topic,

k

. Although other algorithms [e.g., variational expectation maximization, introduced by Blei et al. (2003)] can also estimate

θ

and

ψ

efficiently, this study utilizes the Gibbs sampling algorithm, as several studies have demonstrated that similar key topics can be inferred, regardless of the algorithm that is used (Hofmann and Chisholm 2016).

It should be noted that the current study does not focus on capturing how a specific topic evolves over time, and dynamic topic modeling approaches (e.g., Blei and Lafferty 2006; Wang and McCallum 2006) are neither within the scope nor interest of the analysis presented herein. Instead, the current study considers all topics over the 26-year studied period, focusing on the overall variation within that time window. Therefore, within its scope and objectives, the current study considers the approach by Griffiths and Steyvers (2004) to be the most appropriate through adopting the label-unaware method and subsequently aggregating the results per label.

Temporal Distribution of Topics

The current study utilizes a non-joint modeling approach to understand the overall temporal distribution of the identified topics within all papers published over the period from 1991 to 2016 in the considered journals. In this respect, the proportion of topic

k

distribution over time,

t

, for all the articles,

θ_{k}^{t}

is evaluated following the approach developed by Gatti et al. (2015)

θ_{k}^{t} = \frac{\sum_{d = 1}^{M} θ_{d k} \times \nabla (t_{d} = t)}{\sum_{d = 1}^{M} \nabla (t_{d} = t)}

(1)

where

M

= total number of articles,

θ_{d k}

= proportion of topic

k

in document

d

,

t_{d}

= publication year of document

d

and

t

= index of years. The proportion of topic

k

for all the articles in journal

j

,

θ_{k}^{j}

(Gatti et al. 2015) is the following:

θ_{k}^{j} = \frac{\sum_{d = 1}^{M} θ_{d k} \times \nabla (j_{d} = j)}{\sum_{d = 1}^{M} \nabla (j_{d} = j)}

(2)

where

j_{d}

= journal of document

d

, while

\nabla (j_{d} = j)

= one if (

j_{d} = j

) is true and zero otherwise. The proportion of topic

k

distribution in journal

j

over time

t

,

θ_{k}^{j t}

, is quantified as the following:

θ_{k}^{j t} = \frac{\sum_{d = 1}^{M} θ_{d k} \times \nabla (t_{d} = t, j_{d} = j)}{\sum_{d = 1}^{M} \nabla (t_{d} = t, j_{d} = j)}

(3)

where

\nabla (t_{d} = t, j_{d} = j)

is the number of articles for journal

j

in year

t

. As such,

θ_{k}^{j t}

indicates that articles are aggregated for a particular journal

j

and for the topic

k

and year

t

of interest.

Text Mining

Data Collection and Preprocessing

The two key journals in the field of structural engineering research, Journal of Structural Engineering (JSE) and Engineering Structures (ES), were selected in this study. Both journals are top tiered in the field, according to Science Citation Index (SCI), and comprehensively contribute to structural engineering research in all its subfields. In addition to the qualitative (e.g., reputation) and quantitative (e.g., impact factor) attributes, unlike subdiscipline-focused journals (e.g., ACI, Wind Engineering, and Earthquake Spectra), the former are the two main journals covering the breadth of structural engineering subfields—which is key for the current metaresearch study.

During the data collection, the inclusion of an article was stratified using the following criteria: (1) articles published from 1991 to 2016; (2) articles with abstract containing technical information; (3) articles with abstracts more than 10 words; and (4) articles listed under “Technical Paper” and “Research Paper” categories in JSE and ES, respectively. In this respect, the abstract dataset is based on a total of

M = 11,027

articles that span 26 years. The number of articles for each journal is shown yearly and cumulatively in Figs. 1(a and b), respectively. As shown in the figures, the annual number of articles published in JSE is almost constant, while the same number in ES has increased in general over time with a dramatic boost starting in 2006.

Fig. 1. Total number of publications in each journal from 1991 to 2016; (a) each year; and (b) cumulative.

The presence of linguistic noise is a common problem that negatively influences any statistical analysis within the context of TM (Salloum et al. 2018). This linguistic noise is mainly attributed to the variations in case types (e.g., DESIGN and design), word forms (e.g., experiment and experimental), and the presence of common words (e.g., the, and, of) and special characters (e.g., punctuation). As such, the raw abstract dataset was further processed through five steps (Zhang et al. 2015): (1) tokenization, where all the abstract words were separated into tokens; (2) treatment, where a standard filter stop list in natural language processing was utilized to remove common words; (3) transformation, where all the characters were structured in a lowercase format; (4) stemming, where all the affixes were removed to return words to their word stem; and (5) cleaning, where the abstract dataset was treated to remove all words with more than 25 or less than 4 characters. These preprocessing steps minimized the abstract dataset from 35,579 raw words to 7,795 clean words. Fig. 2 shows a comparison between the raw and clean abstract datasets through word clouds, where the size of each word is in proportion to its probability of occurrence. As shown in Fig. 2, while the raw dataset (i.e., 35,579 words) contains several common words with very high frequency (e.g., the, and, of), the clean dataset (i.e., 7,795 words) has several high-frequency words that are related to structural engineering (e.g., concrete, steel, model, design), demonstrating the importance of these preprocessing steps in enhancing the dataset quality for meaningful and non-trivial analyses.

Fig. 2. Comparison between raw and clean datasets through word clouds.

Model Description

The LDA model requires some basic input parameters to apply an inference algorithm on the underlying abstract dataset. First, this model requires the hyperparameters

β

and

α

that control both the mean shape and sparsity of

ψ_{k}

and

θ_{d}

, respectively, from the underlying Dirichlet distribution, as discussed earlier. Typically, larger

β

and

α

result in uniform topic distributions, while their smaller counterparts yield sparser topic distributions (Griffiths and Steyvers 2004). Since key topics of structural engineering research are relatively well identified, small values of

β = 0.01

and

α = 5 / number

of topics, were assumed in the current study to follow sparse topic distributions (Griffiths and Steyvers 2004).

Second, the optimum number of key topics is one of the main challenges in topic modeling. To address this challenge, perplexity analysis was performed to quantify how well the LDA model can predict the key topics of the abstract dataset (Blei et al. 2003). Perplexity is a standard performance measure for natural language statistical models. More specifically, for a given number of topics, the LDA model is developed, and the synthesized word distributions, represented by the corresponding topics, are compared to the actual topic mixtures, or distribution of words in the documents comprising the dataset (Blei et al. 2003). In this respect, the dataset was divided into 9,925 (90%) and 1,102 (10%) articles for training and testing, respectively. Afterward, for the training dataset of

1,102

articles, the perplexity was evaluated (Blei et al. 2003) as follows:

p r e p l e x i t y = \exp {- \frac{\sum_{d = 1}^{N} \log p (W_{d})}{\sum_{d = 1}^{N} N_{d}}}

(4)

where

W_{d}

and

N_{d}

= word collection and number of words in document

d

, respectively. Fig. 3 shows the sensitivity of the perplexity to the number of topics,

K

, from 10 to 160. As shown in Fig. 3, although the minimum perplexity is attained at

K

value of 160, the current study adopts a

K

value of 40 in order to deal with a practical number of topics, especially considering the small difference in the perplexity value at

K = 40

versus that at

K = 160

.

Fig. 3. Sensitivity of the perplexity to different number of topics.

Topic Identifications

The LDA model results include the word distribution of each topic

k

,

ψ_{k}

, and the probability of word

w

occurring in topic

k

,

ψ_{k w}

. Based on these results, the group of words with high

ψ_{k w}

values in topic

k

can be linked specifically to a corresponding research topic/area (Griffiths and Steyvers 2004), as shown in Fig. 4 for Topics No. 1–8 as samples and in Appendix I (for the remaining Topics No. 9–40). To facilitate rapid visual interpretation, the relative size of any word, in Fig. 4 and Appendix I, corresponds to its

ψ_{k w}

value. Although each topic

k

is connected to the total number of clean words,

V

, in the abstract dataset (i.e., 7,795 words), Fig. 4 and Appendix I present only words with high

ψ_{k w}

values to facilitate the identification of each key topic through these words. For example, the words “earthquake, ground, motion, seismic, spectrum, nonlinear, period, wave, etc.” in Topic No. 1 are mostly related to earthquake/seismic engineering, the words “steel, test, section, buckling, flange, column, strength, member, etc.” in Topic No. 2 are typically connected to steel columns testing, and “damper, dissipation, device, isolation, energy, earthquake, hysteretic, structure, etc.” in Topic No. 21 are frequently used in seismic isolation systems. Fig. 4 and Appendix I show also that the LDA was able to identify some general topics that are typically used in technical writing within several scientific and engineering research areas interchangeably. These include Topic No. 4 “study, develop, present, results, describe, discuss, behavior, possible, etc.”, Topic No. 13 “solution, derivation, formulae, function, matrix, problem, term, number, etc.”, Topic No. 17 “present, evaluate, assess, discuss, focus, provide, etc.”, and Topic No. 29 “coefficient, distribution, assumption, variable, calculation, variation, uniform, parameter, etc.” Overall, 36 technical research and 4 technical writing topics were identified in the studied structural engineering research literature. This indicates that the latter four topics are found in a significant proportion of articles in structural engineering research, which is to be expected. Although not research topics, the identified four technical writing topics highlight the common discourse used in presenting findings, arguments, and hypotheses within structural engineering research articles. Adopting such technical writing language and word use within such topics can be very beneficial to new structural engineering researchers and emerging scholars.

Fig. 4. Word cloud of Topic No. 1 to Topic No. 8.

Overall Temporal Variation of Topics

Eq. (1) was utilized to evaluate the overall temporal variation of each topic inclusion in the articles published over the 26-year period considered, as shown in Fig. 5 and presented in Appendix II. In this figure, the 40 topics identified earlier are presented in order (i.e., from Topic No. 1 to Topic No. 40) from the bottom to the top to show their corresponding probability of inclusion in the JSE and ES abstract dataset from 1991 to 2016. The level of inclusion of any topic (e.g., Topic No. 20) is represented by the area between this topic and the previous topic (i.e., Topic No. 19), as shown in Fig. 5. Fig. 6 shows the temporal variation of five sample technical topics: Topic No. 11 (systems dynamic response) “impact, force, model, dynamic, effect, mass, velocity, response, etc.”; Topic No. 12 (structural design) “code, limit, factor, design, standard, practice, criteria, specification, etc.”; Topic No. 16 (systems control) “sensor, active, hybrid, control, linear, system, friction, passive, etc.”; Topic No. 35 (nonlinear finite element analysis) “model, software, nonlinear, geometry, material, numerical, finite, element, etc.”; and Topic No. 40 (composite structures) “steel, beam, slab, strengthening, connector, fiber, bond, retrofit, etc.” As shown in Fig. 6, the inclusion of Topic No. 40 (composite structures) has been increasing, where its topic probability increased from 0.016 in 1991 to 0.028 in 2016, while inclusions of Topics No. 12 (structural design) and 35 (nonlinear finite element analysis) have been decreasing with topic probabilities of 0.038 (in 1991) to 0.032 (in 2016) and 0.043 (in 1991) to 0.029 (in 2016), respectively. Some topics have also been experiencing either consistent (e.g., Topic No. 11 systems dynamic response) or sporadic (e.g., Topic No. 16 systems control) inclusions, as also shown in Fig. 6.

Fig. 5. Topic distribution from 1991 to 2016.

Fig. 6. Sample of topic distribution from 1991 to 2016.

To quantify the inclusion increase (or decrease) of each topic within the articles published between two temporal windows (years), a contribution index,

C_{k}

, was calculated for each topic using Eq. (5) (Gatti et al. 2015)

C_{k} = \frac{\sum_{t = 2012}^{2016} θ_{k}^{t}}{\sum_{t = 1991}^{1995} θ_{k}^{t}}

(5)

where

C_{k} > 1

indicates that the inclusion of topic

k

has increased from (1991–1995) to (2012–2016), whereas

C_{k} < 1

indicates that topic

k

has declined within the same two temporal windows. Fig. 7 shows the calculated

C_{k}

values for all 40 topics, whereas Figs. 8(a and b) present the temporal variation of only five technical topics with the highest and lowest

C_{k}

values to show the temporal variation of the topic inclusions, respectively, from 1991 to 2016 within the considered structural engineering research literature. As shown in Figs. 7 and 8, Topic No. 5 (reinforced concrete columns testing), Topic No. 25 (material properties), Topic No. 28 (earthquake/seismic engineering), Topic No. 31 (fire testing/analysis), and Topic No. 32 (masonry testing) have been experiencing inclusion increase, whereas Topic No. 7 (soil-structure interaction), Topic No. 20 (plate analysis), Topic No. 24 (bridge engineering), Topic No. 27. (prestressed structures), and Topic No. 30 (buckling) have all been experiencing inclusion reduction.

Fig. 7. Contribution indices of topics from (2012–2016) to (1991–1995).

Fig. 8. Topics with inclusion increase and decrease.

To further investigate the inclusion proportion of each topic over time within both journals (Fig. 9 and Appendix III), Eq. (3) was used. Figs. 9(a and b) demonstrate that topic distribution and proportion of JSE and ES, respectively, are almost consistent over time, confirming that both journals possessed very similar temporal author interests and research compositions. The figure also shows that some topics [e.g., Topic No. 24 (bridge engineering) in JSE and Topic No. 14 (vibration effects) in ES] have been declining over the last 15 years despite the significant increase in the number of research publications within the two journals. This decrease might be attributed to the lack of researchers with such interests, natural variation in research scopes in the last two decades, emergence of specialized journals in bridge engineering and vibration effects, or a combination of all such effects. Overall, the trend demonstrates that some topics have been decreasing as other topics have been the focus of more articles. For example, Topic No. 37 (uncertainty and reliability) in JSE and Topic No. 22 (structural health monitoring) in ES have been experiencing increasing inclusion, as shown in Figs. 9(a and b), respectively.

Fig. 9. Topic distribution from 1991 to 2016 for each journal: (a) JSE; and (b) ES.

Recent Temporal Variation of Topics

All the previously identified topics were inferred based on the 1991 to 2016 JSE and ES dataset, as discussed earlier and shown in Fig. 5. Therefore, these key topics might attenuate the influence of some topics that might have emerged only recently within the corresponding abstract dataset. Such attenuation might be attributed to the small number of publications possibly covering these emerging topics relative to the high volume of the total number of publications from 1991 to 2016. As such, to further evaluate the variation of research content and the potential of new emerging topics, the LDA model was applied on the abstract dataset from 2012 to 2016 only to infer the key topics within this temporal window, as shown in Fig. 10. This temporal window facilitates investigating the correlation between these key topics and their counterparts from 1991 to 2016, as shown earlier in Fig. 5. Fig. 10 shows that the inferred recent key topics are consistent and essentially the same over time, regardless of the temporal window considered. However, as shown in the figure, the main difference between the two temporal windows (i.e., from 1991 to 2016 and from 2012 to 2016) is the inclusion decrease of only two topics after 2012 (i.e., concrete creep and shrinkage and prestressed structures) and the inclusion increase in the same year of only one topic (i.e., shock effects).

Fig. 10. Topic trends of structural engineering research.

Knowledge Gap Discoveries

Word Co-occurrence Network

Co-occurrence networks are generally used to analytically evaluate and visually present the potential relationships between people, organizations, and concepts or, as adopted herein, between entities present within a collection of textual documents (Wuchty and Almass 2005). As such, the current study develops a co-occurrence network of all words within the journal abstract dataset to investigate the interconnection between these words across all topics. This network comprises nodes, representing words, and links, representing the co-occurrence of different words within the same topic. The set of all nodes and links in the word co-occurrence network is {

N

,

L

}, and its adjacency matrix,

A

, is the following:

A = B B^{T}

(6)

In Eq. (6), B = binary matrix of size (

V \times K

), and its elements

b_{v k}

are given by

b_{v k} = {\begin{matrix} 1 & ψ_{k}^{v} \geq 0.05, \\ 0 & o t h e r w i s e . \end{matrix}

(7)

where

b_{v k} = 1

indicates that word

v

is an important component in topic

k

, where the threshold

ψ_{k}^{v}

is taken as 0.05 (Manning and Schütze 1999; Hofmann and Chisholm 2016) in the current study. Therefore, according to Eq. (6), the adjacency matrix,

A

, has a size of

V \times V

with

V = 7,795

, and its elements,

a_{v u}

, represent the number of topics that two words

v

and

u

tend to co-occur effectively when

ψ_{k}^{v} \geq 0.05

and

ψ_{k}^{u} \geq 0.05

.

To visually investigate the structure of the co-occurrence network, the Yifan Hu (Hu 2005) graphical algorithm is adopted to the largest connected component of the network (

N = 803

nodes and

L = 38,448

links). This algorithm utilizes the principle of repulsion and attraction strengths to facilitate rapid visual interpretation. More specifically, attraction draws linked words closer together, while repulsion forces unlinked words further apart from one another in proportion to the strength of such attraction and repulsion. Fig. 11 demonstrates the use of the Yifan Hu algorithm in highlighting the structure of the co-occurrence network, with the sizes of nodes scaled according to their number of direct links. As shown in this figure, some words are more interconnected relative to others within the same co-occurrence network. For example, Fig. 11 clearly demonstrates that, as expected, the words “reinforced” and “concrete” are more interconnected through several research topics compared to the words “reinforced” and “masonry”. The figure also shows some word clusters, revealing some research topics within the co-occurrence network (e.g., wind engineering, seismic isolation systems, and bridge engineering), thus showing the key bridge words connecting these topics. For example, the word “column” connects the “steel columns testing” and the “RC columns testing” topics. In addition, the co-occurrence network can be used as a tool to evaluate the conception distance (i.e., number of research studies) between topics. For example, Fig. 11 shows that some topics (e.g., bridge engineering and wind engineering) are more interconnected through several research studies as opposed to other topics (e.g., bridge engineering and seismic isolation systems) within structural engineering research, thus highlighting possible research opportunities in and knowledge gaps within the latter subfield intersection.

Fig. 11. Co-occurrence network of words across all topics.

Topics Interlinkage Matrix

The current study further develops an interlinkage matrix,

S

, for all the 36 technical research topics within the dataset to visualize the interlinkages (and quantify their strengths) between these topics

S = C C^{T}

(8)

In Eq. (8), C = binary matrix of size (

K \times M

), and its elements

c_{k d}

is given by

c_{k d} = {\begin{matrix} 1 & ψ_{d}^{k} \geq 0.01, \\ 0 & o t h e r w i s e . \end{matrix}

(9)

where

c_{k d} = 1

indicates that topic

k

is an important component in document

d

. Therefore, according to Eq. (8), the interlinkage matrix,

S

, has a size of (

K \times K

) (i.e.,

K = 36

); and its elements,

s_{j k}

, represent the number of articles that two topics

j

and

k

tend to co-occur in considering the

ψ_{d}^{j} \geq 0.01

and

ψ_{d}^{k} \geq 0.01

thresholds. Fig. 12 shows the interlinkage matrix,

S

, normalized to the total number of articles (i.e.,

M = 11,027

) to facilitate direct comparison between the strengths of the interlinkages between all topics. The visualization in Fig. 12 presents a valuable tool for researchers, journal editors and funding agencies/conference organizers, whereas the interlinkages and their strengths can be used to possibly identify knowledge gaps and future strategic directions. For example, Fig. 12 shows that Topic No. 1 (earthquake/seismic engineering) is highly interlinked to Topic No. 38 (analysis methods), Topic No. 34 (nonlinear modeling), and Topic No. 12 (structural design) through strength values of 0.44, 0.43, and 0.42, respectively. Conversely, the same topic (i.e., Topic No. 1) is essentially unlinked to Topic No. 31 (fire testing/analysis), Topic No. 24 (bridge engineering), and Topic No. 15 (wind engineering) through strength values of 0.06, 0.09, and 0.10, respectively, at least within the two journals. However, following an earthquake, for example, fires typically erupt for different reasons (e.g., due to the rupture of either electrical or gas lines) (i.e., Topics 1 and 31). As such, to provide an overall image of the current and possible future structural engineering research directions, the interlinkages between all 36 topics are shown in Appendix IV in the order of their strength. For example, as shown in Appendix IV, while most intersection areas essentially present Red Oceans (Kim and Mauborgne 2014) of well-established interlinked topics with a large number of publications [

s_{j k} > 0.10

] (e.g., torsion effects and analysis methods), there are also a few remaining scattered Blue Ocean (Kim and Mauborgne 2014) opportunities within the 36 topics with

s_{j k} < 0.10

(e.g., the intersection of wind engineering and systems control) that might still require additional future research efforts from the structural engineering community.

Although adopting the approach by Griffiths and Steyvers (2004) facilitates evaluating the overall linkage strength between topics to discover knowledge gaps for future opportunities, integrating other data sources is a possible future extension of the current study. Such analysis would facilitate quantifying the growth/variation of research topics and evaluating the potential of new emerging topics over time. In such case, the dynamic topic model (Blei and Lafferty 2006) or the topic-over-time model (Wang and McCallum 2006) can be utilized to further validate the results presented in the current study.

Blue Oceans

At its core, Blue Ocean Strategy (Kim and Mauborgne 2014) refers to creating value through discovering new needs and thus uncontested spaces, Blue Oceans, instead of struggling to survive in established markets (spaces) that are swarming with vicious competition, Red Oceans (referring to shark-infested waters). The metaresearch analysis performed herein shows that the structural engineering research community continues to produce more journal articles, as shown in Fig. 1, albeit with the majority of these articles essentially within the same established focus areas within the 26-year period considered (Red Oceans). As such, it might be argued that the structural engineering research community is missing immense Blue Ocean opportunities of extending and branching out its knowledge beyond these established boundaries. For example, although emerging, our research output falls short in terms of addressing some key societal needs including for example topics such as resilience, sustainability, and lifecycle design (Bonstrom and Corotis 2014; Cimellaro et al. 2016; Lounis and McAllister 2016; Salem et al. 2017). Such topics were not identified with any appreciable percentage over the 26-year study period, or even when only the latest 5 years within that time window were considered. Studies addressing such societal needs are more relevant now than ever before because of the complexity of the challenges being experienced by different structural systems including those related to climate change-induced hazards, aging populations, deteriorating infrastructure, and natural/anthropogenic interdependent hazards and subsequent systemic (cascade-type) risks (Ezzeldin and El-Dakhakhni 2019).

Notwithstanding the opportunities identified through the weak interlinkage of some established structural engineering topics, the real Blue Oceans are, in the authors’ view, elsewhere. When one considers other engineering disciplines (e.g., mechanical and electrical engineering), it is easy to examine the growing impacts of idea fusions at the interfaces of different disciplines that disrupt entire well-established engineering fields. In our field, for instance, considering intelligent structural systems, to name but one major opportunity, such systems adapt their characteristics to mitigate excessive response and monitor their own conditions and are envisioned to also perform self-diagnosis and possibly even self-repair (Holnicki-Szulc and Soares 2013). The structural analysis, design, construction, and future upgrades of such systems create numerous multidisciplinary research opportunities at the interfaces between structural engineering and multiple emerging cross-cutting technologies. Such technologies, already disrupting the world around us, include data fusion, sensing, robotics, data-driven models, machine learning, unmanned ground or aerial vehicles, adaptive controls, artificial intelligence, virtual and mixed reality visualization, bio-inspired materials and structures, and three-dimensional (3D) printing (Manyika et al. 2013).

Finally, it is important to highlight that viewing the breadth of structural engineering research through a metaresearch lens should not be considered as orthogonal to or refuting in-depth research necessary in structural engineering articles. In fact, metaresearch is key to minimizing research waste (Chalmers and Glasziou 2009) of funds, time, human resources, subsequent low/no-use publications, and non-transferable knowledge and to distinguishing between over-saturated and grand-challenge research areas in order to facilitate effective and efficient research. Furthermore, a Blue Ocean lens highlights breakthrough opportunities through either cross-pollinating ideas within structural engineering subfields or at the interfaces of structural engineering and other fields/technologies, opportunities that may be hidden under the plethora of research articles published every year in traditional areas.

Conclusions

TM has become one of the most effective techniques incorporated in different fields to metaresearch fields’ scientific publications. However, the structural engineering research community has never before utilized TM to identify its research trends and quantitatively discover knowledge gaps. Considering the ever-increasing number of articles covering different subfields within structural engineering, metaresearching structural engineering is key to tracing and quantifying topic trends and developing tools to evaluate the linkage strength, or lack thereof, between such topics. The analyses thus facilitate discovering knowledge gaps and research opportunities that might require further study, whereas some structural engineering topic intersection areas remain relatively unexplored and structural engineering interfaces with numerous emerging technologies remain practically untouched.

The study first collected and cleaned the text data of 11,027 article abstracts published from 1991 to 2016 in the two leading structural engineering journals: Journal of Structural Engineering and Engineering Structures. The LDA topic modeling technique was subsequently applied to the cleaned abstracts to identify their key topics and provide a summarized overview of the current trends. Afterward, several quantitative measures were used to evaluate the temporal distribution of these topics over the considered 26 years period compared to that of the latest 5 years. Finally, a word co-occurrence network and a topic interlinkage matrix were also developed, providing visual tools to evaluate the structural research topic co-occurrence and the linkage strength between topics.

The analysis results identified 36 technical research and four technical writing topics in the literature of structural engineering research. The distribution and proportion of these topics were evaluated over time, where some showed inclusion increase (e.g., reinforced concrete columns testing, material properties, earthquake/seismic engineering, fire testing/analysis and masonry testing) and other experienced inclusion decrease (e.g., soil-structure interaction, plates analysis, bridge engineering, prestressed structures, and buckling). The results demonstrated that although there are multiple established subfields, each with a correspondingly large number of publications, there are also some subfield intersection areas that might warrant further research.

It is also clear that all 36 identified research topics fall short in terms of linkage to different disruptive technologies. The integration of such readily available technologies within mainstream structural engineering presents Blue Ocean opportunities and is more relevant now than ever before as structural engineers continue to face research complex challenges. These include, to name but one example, the intricate task of addressing the ever more demanding public need for more resilient structural systems under increased frequency and magnitude of extreme natural hazard events. At the same time, the increasing complexity, initial capital investments, and lifecycle operation costs of our built environment are only paralleled by the abundance of interdependent data sources, intelligent cyber-physical systems, advanced data fusion techniques, and unprecedented computational powers.

The meta research analyses presented herein can benefit different stakeholders in our structural engineering community including (1) conference organizers (e.g., to prioritize the scope of their future conferences); (2) journal editors (e.g., to develop new interest within their journals); (3) researchers (e.g., to consider breakthrough and seminal work opportunities at the interface with other fields); (4) funding agencies (e.g., to prioritize and strategize research investments prudently and efficiently toward identified knowledge gaps and possible future opportunities for societal benefits); and (5) designers/regulators (e.g., to improve structural design, safety, economy, and resilience).

The comprehensive text analytics presented in the current study aim at drawing a vivid picture of what the structural engineering research community has been focused on over the past quarter of a century. We hope that such analyses would initiate inward discussions and possibly a paradigm shift in our community’s thinking and highlight opportunities to make significant breakthroughs based on truly interdisciplinary research initiatives.

Notation

The following symbols are used in this paper:

$A$: adjacency matrix of size ( $V \times V$ );
$B$: binary matrix of size ( $V \times K$ );
$C_{k}$: contribution index of topic $k$ ;
$D$: index of documents;
$I$: index of words;
$J$: index of journals;
$j_{d}$: journal of document $d$ ;
$K$: number of topics;
$k$: index of topics;
$L$: total of number links;
$M$: total number of articles;
$N$: total of number nodes;
$N_{d}$: total number of words in document $d$ ;
$S$: interlinkage matrix of size ( $K \times K$ );
$t$: index of years;
$t_{d}$: publication year of document $d$ ;
$V$: total number of clean words;
$W_{d}$: word collection in document $d$ ;
$w_{d i}$: word $i$ in document $d$ ;
$Z_{d i}$: topic assignment for word $w_{d i}$ from document $d$ ;
$α$: Dirichlet prior on the per-document topic distributions;
$β$: Dirichlet prior on the per-topic word distributions;
$Θ$: topic distribution;
$θ_{d}$: topic distribution in document $d$ ;
$θ_{k}^{j}$: proportion of topic $k$ in journal $j$ ;
$θ_{k}^{j t}$: proportion of topic $k$ in journal $j$ over time $t$ ;
$θ_{k}^{t}$: proportion of topic $k$ at year $t$ ;
$ψ$: word distribution;
$ψ_{k}$: distribution of words in topic $k$ ;
$ψ_{k w}$: probability of word $w$ in topic $k$ ; and
$ψ_{Z d i}$: multinomial distribution of topic assignment for word $w_{d i}$ from document $d$ .

Appendix I. Word Cloud of Topics

Fig. 13 shows the word cloud of Topic No. 9 to Topic No. 40.

Fig. 13. Word cloud of Topic No. 9 to Topic No. 40.

Appendix II. Overall Topic Distribution from 1991 to 2016

Topic No. 1 to Topic No. 20:

Year	Topic 1 (%)	Topic 2 (%)	Topic 3 (%)	Topic 4 (%)	Topic 5 (%)	Topic 6 (%)	Topic 7 (%)	Topic 8 (%)	Topic 9 (%)	Topic 10 (%)	Topic 11 (%)	Topic 12 (%)	Topic 13 (%)	Topic 14 (%)	Topic 15 (%)	Topic 16 (%)	Topic 17 (%)	Topic 18 (%)	Topic 19 (%)	Topic 20 (%)
1991	2.34	2.37	3.61	3.52	2.02	3.45	2.19	2.59	3.18	2.12	2.30	3.73	4.01	1.92	1.06	1.29	2.97	2.78	2.13	2.63
1992	2.37	3.22	3.20	2.86	2.52	2.99	2.68	1.95	2.42	2.01	2.07	4.30	4.31	2.08	0.95	0.64	2.73	2.19	2.00	2.86
1993	2.36	2.28	2.49	3.24	2.04	3.30	2.05	2.64	2.28	2.25	2.29	4.17	4.39	1.75	1.15	1.42	3.26	2.63	2.44	3.06
1994	2.34	2.67	3.15	3.24	2.84	2.58	2.35	2.73	2.57	2.55	1.70	3.94	3.48	1.75	0.99	1.17	3.01	2.01	2.04	2.96
1995	1.65	2.51	2.83	3.49	2.62	3.02	2.27	2.46	2.56	2.68	1.64	4.01	3.68	2.46	1.08	1.07	2.71	2.39	2.03	2.88
1996	3.20	2.52	2.60	3.45	2.09	2.95	2.16	2.42	2.26	2.53	2.21	4.08	4.17	2.44	1.10	1.96	3.06	2.06	1.92	2.55
1997	2.39	2.45	3.19	3.45	2.59	2.87	2.07	2.69	2.57	2.88	1.75	3.98	3.85	2.54	0.68	1.52	2.94	3.32	1.95	2.57
1998	2.49	2.91	3.27	3.50	3.31	2.32	2.26	2.96	2.22	3.35	1.58	3.98	2.77	2.18	1.12	2.02	3.26	2.17	2.26	2.65
1999	1.66	3.94	2.32	3.51	2.94	2.53	2.12	2.51	2.75	2.84	1.61	3.86	2.90	2.24	1.56	1.78	3.18	2.14	1.83	2.96
2000	2.96	2.16	2.85	3.66	2.44	2.92	1.87	2.27	2.43	3.06	1.86	3.67	3.76	3.01	1.54	1.14	3.06	1.81	2.15	2.83
2001	2.60	2.57	3.55	3.47	2.73	2.71	1.49	2.42	2.26	2.94	1.60	3.84	4.09	2.29	1.20	1.42	2.83	2.04	2.14	2.68
2002	2.56	3.34	2.91	3.59	2.85	2.89	1.81	2.43	2.27	2.95	1.96	3.70	3.44	2.25	1.09	1.82	3.11	2.09	1.89	2.73
2003	2.51	3.12	2.58	3.29	3.16	2.86	1.73	2.77	2.01	3.30	1.72	3.74	2.98	2.03	0.90	2.28	2.87	2.17	1.69	2.50
2004	2.75	3.50	3.17	3.23	3.27	2.97	1.87	2.94	2.34	2.83	1.72	3.46	2.48	2.50	0.97	1.39	3.16	1.79	1.92	2.38
2005	2.55	2.62	2.69	3.56	2.55	2.94	1.63	3.00	2.19	2.79	1.80	3.57	2.75	3.23	1.25	1.66	2.99	2.46	1.97	1.97
2006	2.78	4.04	2.47	3.45	2.40	2.94	1.90	3.05	2.03	2.94	1.55	3.47	2.57	2.35	1.14	1.86	3.06	2.84	2.21	2.27
2007	2.71	2.03	3.02	3.38	3.21	2.90	1.81	2.79	2.34	2.88	1.91	2.99	3.14	2.83	0.92	1.49	2.87	2.17	1.97	2.08
2008	2.63	2.62	2.65	3.32	2.86	2.94	1.78	2.60	2.65	2.86	1.81	3.29	3.01	2.58	1.06	1.69	3.18	2.14	1.94	2.33
2009	2.11	2.41	3.58	3.52	2.78	2.76	2.02	2.65	2.56	2.87	1.94	3.62	2.55	2.33	1.29	1.33	2.96	2.02	1.89	1.98
2010	1.76	2.77	2.91	3.51	3.17	2.59	1.73	2.93	2.55	2.47	2.16	3.52	2.43	2.11	1.25	1.46	2.91	1.97	2.54	1.98
2011	2.95	2.06	2.78	3.90	2.65	2.56	1.54	3.18	3.34	2.40	1.82	3.29	2.27	2.34	1.17	1.33	3.36	1.96	2.28	2.24
2012	1.69	2.51	2.92	3.91	3.74	2.41	1.73	2.93	2.87	2.79	1.97	3.45	2.39	1.86	1.03	1.23	3.09	1.81	2.17	2.26
2013	1.83	2.50	2.94	3.71	3.37	2.58	1.55	2.96	2.86	2.53	1.78	3.40	2.45	2.13	1.04	1.31	3.29	1.78	2.20	2.14
2014	2.37	2.26	2.95	3.45	3.08	2.59	1.63	3.17	3.10	2.36	1.94	3.49	2.48	1.77	1.42	1.28	3.42	2.04	2.26	2.04
2015	2.03	2.63	3.30	3.63	3.91	2.66	1.62	2.89	3.18	2.45	2.08	3.23	2.34	1.65	1.19	1.11	3.18	2.00	2.14	2.01
2016	1.82	2.63	3.17	3.61	3.70	2.56	1.67	3.17	3.14	2.82	1.88	3.41	2.24	1.60	1.17	1.28	3.39	1.61	2.05	2.06

Topic No. 21 to Topic No. 40:

Year	Topic 21 (%)	Topic 22 (%)	Topic 23 (%)	Topic 24 (%)	Topic 25 (%)	Topic 26 (%)	Topic 27 (%)	Topic 28 (%)	Topic 29 (%)	Topic 30 (%)	Topic 31 (%)	Topic 32 (%)	Topic 33 (%)	Topic 34 (%)	Topic 35 (%)	Topic 36 (%)	Topic 37 (%)	Topic 38 (%)	Topic 39 (%)	Topic 40 (%)
1991	1.11	1.88	1.34	1.71	1.87	2.37	2.93	1.69	3.66	2.45	0.64	1.91	1.19	4.19	4.36	1.75	2.40	5.60	3.14	1.60
1992	2.14	1.47	1.85	2.10	1.72	2.35	3.47	1.33	3.37	2.86	0.70	1.74	1.18	3.65	3.70	2.04	2.19	5.19	4.35	2.25
1993	1.68	1.25	2.02	2.14	2.20	2.56	3.15	2.23	3.00	2.54	0.57	1.73	1.32	3.13	3.41	1.75	2.47	5.95	3.60	1.82
1994	1.44	1.94	1.81	1.97	1.82	2.52	2.43	2.44	3.25	2.96	0.88	2.07	1.13	4.28	3.30	1.80	3.10	5.57	3.42	1.81
1995	2.03	2.25	1.80	1.81	2.14	2.86	2.89	2.22	3.35	2.62	0.87	2.08	1.38	4.76	2.48	1.83	2.04	4.76	3.66	2.13
1996	1.98	2.25	1.54	1.48	2.33	2.36	2.06	2.31	3.86	2.44	0.82	1.95	1.17	4.52	3.62	1.76	1.99	5.32	3.52	0.99
1997	1.59	1.80	1.84	1.19	2.12	2.07	2.54	2.20	3.18	3.73	0.63	1.91	1.37	4.16	2.90	1.98	2.50	5.21	3.69	1.16
1998	1.66	2.26	2.06	1.50	2.03	2.16	2.45	2.24	2.71	2.55	0.77	2.15	1.18	3.98	3.06	2.14	2.80	4.19	3.61	1.93
1999	1.39	2.28	2.09	1.24	2.17	2.74	2.80	2.01	3.13	3.39	0.86	2.35	1.46	4.06	2.72	1.83	2.18	4.92	3.65	1.56
2000	1.55	1.80	2.32	1.15	1.92	2.35	2.48	2.58	3.84	2.83	0.91	1.96	0.92	4.39	3.62	1.62	2.54	5.29	2.88	1.61
2001	1.62	2.55	1.76	1.04	2.14	2.81	2.14	2.02	3.22	2.37	0.82	1.91	1.32	5.01	2.90	2.16	2.22	5.53	3.08	2.52
2002	2.03	2.41	2.71	1.83	2.21	2.54	2.23	2.44	3.20	2.20	0.63	1.90	1.14	4.19	3.47	2.02	2.03	4.24	3.01	1.90
2003	2.48	2.61	1.88	1.50	2.10	2.48	1.90	2.59	2.79	2.21	1.18	2.14	1.45	4.10	3.49	2.47	2.13	4.49	3.18	2.63
2004	2.50	2.19	2.06	1.54	2.19	2.50	2.13	2.84	2.69	1.80	1.21	2.32	1.58	4.03	3.21	2.07	2.56	4.04	3.64	2.26
2005	2.04	2.99	1.58	1.53	2.12	2.14	2.12	3.00	3.05	1.91	0.71	2.35	1.45	4.45	3.38	2.29	2.35	4.78	3.10	2.49
2006	1.92	3.21	2.00	1.44	1.99	2.23	2.16	3.05	3.07	1.74	0.81	2.08	1.66	3.90	2.72	2.14	2.34	4.80	3.56	1.87
2007	2.50	2.78	1.61	1.43	2.22	2.72	2.19	2.98	3.00	2.08	1.15	2.31	1.48	4.52	2.96	1.83	2.53	4.99	2.80	2.47
2008	2.68	2.64	1.73	1.76	2.45	2.48	2.16	2.98	3.17	2.04	0.84	2.54	1.27	4.56	3.44	1.81	1.88	4.56	2.65	2.42
2009	1.95	2.69	1.54	1.38	2.89	2.62	2.06	2.91	3.26	1.63	1.44	2.60	1.91	4.49	3.09	2.32	2.28	4.25	2.86	2.68
2010	1.80	2.68	1.63	1.97	2.72	2.57	2.51	2.83	3.23	1.65	1.28	2.52	1.72	4.58	3.10	2.11	2.58	4.02	3.21	2.58
2011	1.50	2.69	1.85	1.55	2.90	2.49	2.23	3.02	3.25	1.77	1.76	2.44	1.19	4.27	3.08	2.16	2.80	4.47	2.76	2.40
2012	1.76	2.31	2.23	1.78	2.78	2.42	2.33	2.70	2.95	2.00	1.20	2.73	1.90	4.74	3.11	1.99	2.48	3.97	3.24	2.60
2013	1.86	2.60	2.21	1.39	2.99	3.04	2.03	2.48	3.19	1.71	1.20	2.91	1.73	4.75	3.42	1.99	2.47	3.79	3.17	2.73
2014	1.92	2.55	1.74	1.41	2.92	2.62	1.84	3.06	3.04	1.77	1.52	2.78	1.63	5.00	2.98	2.34	2.68	4.06	2.78	2.27
2015	1.54	2.45	2.09	1.33	3.15	2.48	1.82	2.64	2.91	1.68	1.32	2.83	1.53	5.39	3.29	2.25	2.47	3.93	2.83	2.81
2016	1.83	2.56	2.08	1.32	2.74	2.66	2.16	3.26	2.92	1.66	1.18	2.95	1.86	4.47	2.89	2.38	2.81	3.59	2.73	2.97

Appendix III. Topic Distribution from 1991 to 2016 for Each Journal

JSE Topic No. 1 to Topic No. 20:

Year	Topic 1 (%)	Topic 2 (%)	Topic 3 (%)	Topic 4 (%)	Topic 5 (%)	Topic 6 (%)	Topic 7 (%)	Topic 8 (%)	Topic 9 (%)	Topic 10 (%)	Topic 11 (%)	Topic 12 (%)	Topic 13 (%)	Topic 14 (%)	Topic 15 (%)	Topic 16 (%)	Topic 17 (%)	Topic 18 (%)	Topic 19 (%)	Topic 20 (%)
1991	2.41	2.51	3.81	3.30	2.15	3.65	2.08	2.39	3.22	2.15	2.24	3.45	4.08	1.76	0.93	1.15	2.54	2.72	2.02	2.61
1992	1.66	3.41	3.58	2.72	2.77	2.82	2.72	2.04	2.38	1.85	1.66	4.45	4.24	1.79	0.85	0.62	2.69	2.20	2.24	3.01
1993	2.19	2.57	2.74	3.18	2.31	3.35	1.89	2.38	2.44	2.28	1.82	3.96	4.55	1.45	0.92	1.29	2.85	2.52	2.47	3.07
1994	1.96	2.25	3.86	2.93	3.57	2.43	2.24	2.51	2.66	2.54	1.33	3.77	3.39	1.58	0.73	1.06	2.84	1.89	2.18	2.80
1995	1.36	3.03	3.30	3.19	2.80	2.72	2.14	2.53	2.74	2.51	1.49	4.31	3.43	1.77	1.04	0.95	2.65	2.53	2.11	2.97
1996	2.68	3.08	3.14	3.31	2.66	2.81	2.02	2.51	2.46	2.78	1.83	4.09	3.78	2.04	0.88	1.66	2.62	2.12	1.67	2.76
1997	2.19	3.18	3.86	3.45	3.12	2.77	1.90	2.45	2.82	3.05	1.28	3.91	3.23	1.93	0.62	1.41	2.75	2.87	2.17	2.63
1998	2.24	3.79	3.98	3.11	4.38	2.63	2.21	2.38	1.94	3.13	1.25	3.53	2.85	1.66	1.18	1.13	2.28	2.41	1.91	3.17
1999	1.59	5.20	2.55	3.32	3.53	2.53	2.19	2.47	3.09	3.43	1.23	3.34	2.11	1.78	1.54	1.30	2.77	1.76	1.81	2.68
2000	2.74	2.34	3.28	3.71	2.98	2.75	1.92	2.53	2.42	3.43	1.47	3.47	2.89	1.89	1.80	1.27	3.21	1.85	2.23	2.81
2001	2.39	3.29	4.67	2.93	3.40	2.96	1.42	2.25	2.27	3.38	0.99	3.53	2.92	1.35	1.16	1.28	2.51	2.02	1.92	2.79
2002	1.75	4.85	3.26	3.63	3.50	2.72	1.64	2.31	2.45	3.64	1.33	4.16	3.22	1.53	1.10	1.21	2.95	2.01	2.02	2.97
2003	1.65	4.25	2.64	3.35	4.11	2.99	1.78	2.82	2.35	3.21	0.95	3.67	2.67	1.40	0.94	2.51	2.75	2.24	1.78	2.69
2004	2.17	4.99	3.45	3.16	3.94	2.82	1.91	3.04	2.26	3.36	1.38	3.52	2.09	1.66	1.03	1.18	3.09	1.50	2.01	2.37
2005	2.09	3.63	2.57	3.44	2.57	2.92	1.85	2.92	2.29	3.18	1.62	3.69	2.50	2.37	1.61	1.61	2.88	2.28	1.59	2.12
2006	2.63	5.15	2.59	3.54	2.24	2.48	1.73	3.23	2.20	3.35	1.03	3.86	2.42	1.74	0.77	1.72	2.90	3.45	2.14	2.50
2007	2.30	2.74	3.49	3.08	3.25	2.67	1.86	3.01	2.52	3.86	1.79	3.20	2.92	2.50	1.00	1.68	2.49	2.36	1.83	1.84
2008	3.06	3.56	2.16	2.89	3.05	2.94	2.10	2.62	2.76	4.22	1.29	3.49	2.29	1.92	1.46	1.86	2.81	1.54	1.87	2.17
2009	1.73	3.23	3.89	3.35	3.11	2.37	1.60	2.63	2.41	4.17	1.29	4.20	1.85	1.79	1.45	1.61	2.60	1.67	1.67	2.12
2010	2.00	4.10	2.62	3.16	2.79	2.69	2.13	2.60	2.54	3.06	1.69	3.95	1.96	1.64	2.07	1.72	2.88	1.84	2.12	1.98
2011	4.71	2.75	2.58	3.11	2.38	2.65	1.21	3.02	3.49	3.80	1.38	3.41	2.08	2.13	1.20	1.21	3.11	1.98	1.88	2.44
2012	1.60	3.28	2.86	4.05	3.82	2.27	1.80	2.93	2.45	3.67	1.51	3.65	2.64	1.22	0.90	0.89	2.83	1.73	1.81	2.64
2013	1.92	3.57	2.38	3.64	3.52	2.34	1.56	3.28	2.79	4.19	1.31	3.76	2.06	1.75	1.17	1.99	3.51	1.14	1.74	1.98
2014	2.25	2.86	3.28	3.62	3.97	2.44	1.47	2.54	3.34	3.70	1.32	3.99	2.10	0.98	1.86	1.24	2.72	1.64	1.84	2.21
2015	2.10	3.40	3.51	3.36	4.29	2.58	1.42	2.77	3.63	3.06	1.85	3.50	1.89	1.36	1.42	1.05	2.89	1.62	2.12	1.79
2016	1.75	3.02	2.84	3.44	3.37	2.14	1.76	3.72	2.83	3.96	1.55	4.31	2.08	1.01	1.26	1.50	3.77	1.40	1.90	2.07

JSE Topic No. 21 to Topic No. 40:

Year	Topic 21 (%)	Topic 22 (%)	Topic 23 (%)	Topic 24 (%)	Topic 25 (%)	Topic 26 (%)	Topic 27 (%)	Topic 28 (%)	Topic 29 (%)	Topic 30 (%)	Topic 31 (%)	Topic 32 (%)	Topic 33 (%)	Topic 34 (%)	Topic 35 (%)	Topic 36 (%)	Topic 37 (%)	Topic 38 (%)	Topic 39 (%)	Topic 40 (%)
1991	1.17	1.95	1.43	1.70	1.77	2.53	3.01	1.76	3.59	2.54	0.56	1.98	1.24	4.25	4.50	1.85	2.36	5.67	3.24	1.73
1992	2.03	1.55	1.98	2.42	1.64	2.46	3.76	1.27	3.36	2.91	0.71	1.88	1.14	3.66	3.14	2.29	2.18	4.82	4.48	2.60
1993	1.54	1.25	2.18	2.29	2.22	2.72	3.51	2.13	2.90	2.61	0.57	1.68	1.35	3.12	3.55	1.94	2.51	5.97	3.83	1.91
1994	1.20	2.24	1.79	2.38	1.76	2.67	2.68	2.42	3.30	2.64	0.87	2.12	1.12	4.71	3.21	2.01	3.34	5.24	3.58	2.19
1995	1.30	2.31	1.52	2.05	2.11	3.15	3.32	1.77	3.30	2.57	0.87	1.95	1.55	5.14	2.29	2.00	1.92	4.56	4.06	2.66
1996	1.48	2.35	1.85	1.70	1.61	2.94	2.38	2.21	3.38	2.36	0.89	2.19	1.34	4.71	3.48	2.03	1.94	5.14	4.00	1.13
1997	1.17	1.87	2.22	1.46	1.75	2.18	2.75	2.12	2.87	3.68	0.53	2.17	1.56	4.28	2.66	2.60	2.49	4.59	3.96	1.52
1998	1.21	1.82	2.22	1.09	1.92	2.89	2.76	1.65	2.71	2.64	0.92	2.25	1.44	4.47	2.70	2.50	2.35	4.31	4.70	2.26
1999	1.05	2.71	2.48	1.22	1.49	3.16	2.73	1.90	2.56	3.43	0.90	2.50	1.69	4.30	2.07	2.40	2.53	4.52	4.31	1.82
2000	1.57	1.73	3.19	0.88	1.64	2.88	2.62	2.77	3.21	2.50	0.84	2.35	0.92	4.66	2.49	2.09	2.64	4.76	3.25	1.99
2001	1.20	2.43	2.11	0.74	1.64	3.27	2.20	1.89	2.66	2.27	0.99	1.77	1.49	5.87	2.28	2.73	2.30	5.68	3.65	3.44
2002	1.64	2.12	3.07	1.54	1.79	3.07	2.43	2.41	2.61	2.39	0.78	1.93	1.40	4.15	2.84	2.30	2.17	3.89	3.58	1.63
2003	2.44	2.62	1.93	1.01	1.62	2.67	2.10	1.98	2.20	1.74	1.68	2.39	1.73	4.03	2.59	3.08	2.28	3.91	3.80	3.46
2004	1.91	1.77	2.41	1.01	1.92	2.72	2.29	2.46	2.34	1.94	1.25	2.79	1.98	4.12	3.04	2.56	2.39	3.91	4.01	2.26
2005	1.91	2.23	1.84	1.38	1.73	2.45	2.38	2.83	2.73	1.99	0.73	2.38	1.64	4.72	3.26	2.81	2.42	4.58	3.79	2.48
2006	1.09	2.99	2.65	1.02	1.70	2.16	2.33	2.99	2.70	1.75	0.84	2.30	2.30	3.83	2.46	2.85	2.45	4.52	4.02	1.38
2007	2.25	2.59	1.69	1.11	1.67	2.95	2.63	2.91	2.25	1.77	0.74	2.54	2.27	4.97	2.33	2.49	2.98	4.11	2.80	2.55
2008	3.80	1.94	2.19	1.45	1.65	2.21	2.09	3.12	2.81	1.67	0.80	3.28	1.87	4.88	2.17	2.93	1.91	3.99	3.34	1.84
2009	2.08	2.62	1.82	0.69	1.87	3.11	1.84	3.71	2.72	1.16	1.58	2.58	3.06	4.84	2.15	3.35	2.45	3.61	3.35	2.66
2010	2.02	1.89	1.86	1.02	1.85	2.88	2.78	3.58	2.85	1.29	1.18	2.95	2.79	4.46	2.16	3.30	2.38	3.54	3.45	2.23
2011	1.24	2.68	1.89	0.63	1.88	2.79	2.08	4.45	2.30	2.06	1.87	2.44	1.19	4.98	2.23	2.51	3.33	4.51	2.63	1.77
2012	1.76	1.66	2.58	1.24	1.58	3.40	2.80	2.65	2.62	2.26	0.67	2.32	2.61	5.16	2.43	2.79	2.47	4.04	4.25	2.15
2013	1.94	2.62	3.30	0.77	1.64	3.39	1.66	3.49	2.37	1.62	0.94	2.85	2.62	4.55	2.05	2.61	2.77	3.74	4.02	1.44
2014	1.74	2.29	2.35	0.66	1.87	2.66	2.04	3.05	2.60	1.63	1.09	3.09	2.46	5.94	2.06	3.47	2.65	3.67	3.12	2.22
2015	1.73	2.34	2.45	0.95	2.26	2.44	1.91	3.21	2.32	1.77	1.37	2.73	2.12	5.96	1.95	2.64	2.41	3.63	3.43	2.75
2016	1.65	2.11	2.78	0.87	1.91	1.96	2.10	4.08	2.70	1.40	1.38	3.05	2.72	4.69	2.20	3.21	3.12	3.28	2.97	2.14

ES Topic No. 1 to Topic No. 20

Year	Topic 1 (%)	Topic 2 (%)	Topic 3 (%)	Topic 4 (%)	Topic 5 (%)	Topic 6 (%)	Topic 7 (%)	Topic 8 (%)	Topic 9 (%)	Topic 10 (%)	Topic 11 (%)	Topic 12 (%)	Topic 13 (%)	Topic 14 (%)	Topic 15 (%)	Topic 16 (%)	Topic 17 (%)	Topic 18 (%)	Topic 19 (%)	Topic 20 (%)
1991	1.75	1.17	1.88	5.38	0.96	1.69	3.14	4.26	2.81	1.87	2.81	6.11	3.41	3.28	2.17	2.43	6.61	3.31	3.04	2.74
1992	6.00	2.23	1.25	3.56	1.29	3.82	2.49	1.52	2.61	2.80	4.10	3.56	4.64	3.59	1.47	0.76	2.92	2.16	0.80	2.07
1993	3.13	0.98	1.40	3.54	0.83	3.05	2.74	3.79	1.53	2.08	4.41	5.15	3.69	3.14	2.19	2.01	5.06	3.13	2.28	3.02
1994	3.45	3.91	1.08	4.13	0.68	3.01	2.68	3.37	2.32	2.59	2.77	4.43	3.74	2.24	1.76	1.48	3.50	2.36	1.63	3.41
1995	2.44	1.11	1.56	4.30	2.13	3.85	2.61	2.27	2.10	3.15	2.05	3.18	4.35	4.31	1.17	1.40	2.87	2.00	1.82	2.63
1996	4.14	1.52	1.62	3.70	1.08	3.20	2.40	2.28	1.90	2.08	2.87	4.08	4.88	3.15	1.49	2.49	3.85	1.96	2.35	2.18
1997	2.74	1.16	2.02	3.46	1.65	3.04	2.37	3.11	2.13	2.57	2.60	4.10	4.95	3.61	0.78	1.73	3.29	4.12	1.55	2.45
1998	2.84	1.68	2.29	4.03	1.84	1.88	2.34	3.77	2.61	3.65	2.03	4.60	2.66	2.88	1.02	3.25	4.63	1.85	2.73	1.93
1999	1.79	1.64	1.89	3.86	1.87	2.52	1.98	2.58	2.14	1.77	2.30	4.80	4.32	3.06	1.60	2.63	3.92	2.82	1.85	3.48
2000	3.21	1.95	2.35	3.60	1.83	3.10	1.80	1.98	2.43	2.63	2.30	3.90	4.76	4.29	1.24	1.00	2.89	1.76	2.07	2.85
2001	2.84	1.70	2.20	4.12	1.93	2.42	1.58	2.62	2.25	2.41	2.34	4.22	5.51	3.42	1.25	1.60	3.21	2.07	2.42	2.54
2002	3.64	1.34	2.44	3.55	1.98	3.12	2.04	2.60	2.03	2.03	2.80	3.09	3.73	3.20	1.06	2.61	3.32	2.20	1.71	2.42
2003	3.51	1.81	2.51	3.22	2.07	2.72	1.67	2.72	1.61	3.41	2.61	3.82	3.34	2.76	0.87	2.01	3.00	2.08	1.58	2.28
2004	3.44	1.74	2.84	3.31	2.47	3.15	1.82	2.82	2.44	2.19	2.14	3.39	2.94	3.49	0.89	1.64	3.26	2.13	1.82	2.39
2005	3.06	1.48	2.82	3.70	2.53	2.97	1.37	3.09	2.08	2.36	2.01	3.43	3.03	4.21	0.85	1.72	3.12	2.66	2.40	1.80
2006	2.94	2.78	2.33	3.35	2.59	3.46	2.10	2.86	1.84	2.47	2.14	3.02	2.74	3.04	1.56	2.02	3.25	2.15	2.29	2.00
2007	2.95	1.61	2.74	3.56	3.18	3.05	1.78	2.66	2.24	2.29	1.97	2.86	3.28	3.03	0.88	1.38	3.10	2.05	2.06	2.22
2008	2.39	2.11	2.91	3.55	2.76	2.94	1.61	2.58	2.59	2.12	2.09	3.17	3.40	2.94	0.84	1.59	3.38	2.47	1.97	2.42
2009	2.31	1.98	3.41	3.60	2.61	2.97	2.24	2.66	2.63	2.18	2.28	3.31	2.91	2.62	1.21	1.19	3.15	2.19	2.01	1.90
2010	1.66	2.18	3.04	3.67	3.34	2.54	1.55	3.08	2.56	2.20	2.36	3.32	2.64	2.32	0.88	1.34	2.92	2.03	2.72	1.97
2011	2.18	1.75	2.87	4.26	2.77	2.52	1.68	3.24	3.27	1.77	2.02	3.24	2.36	2.43	1.16	1.38	3.46	1.95	2.46	2.15
2012	1.73	2.23	2.94	3.85	3.72	2.46	1.71	2.93	3.02	2.47	2.14	3.37	2.30	2.08	1.08	1.36	3.19	1.84	2.30	2.13
2013	1.80	2.14	3.12	3.73	3.32	2.66	1.55	2.85	2.88	1.99	1.93	3.28	2.57	2.25	0.99	1.08	3.22	1.99	2.35	2.20
2014	2.42	2.02	2.81	3.38	2.73	2.65	1.69	3.43	3.00	1.82	2.19	3.29	2.63	2.09	1.24	1.30	3.70	2.21	2.43	1.97
2015	2.00	2.30	3.21	3.75	3.75	2.69	1.71	2.94	2.99	2.18	2.18	3.12	2.54	1.78	1.09	1.14	3.31	2.16	2.14	2.10
2016	1.84	2.46	3.32	3.68	3.84	2.74	1.62	2.92	3.28	2.31	2.02	3.01	2.31	1.86	1.14	1.18	3.23	1.70	2.12	2.06

ES Topic No. 21 to Topic No. 40

Year	Topic 21 (%)	Topic 22 (%)	Topic 23 (%)	Topic 24 (%)	Topic 25 (%)	Topic 26 (%)	Topic 27 (%)	Topic 28 (%)	Topic 29 (%)	Topic 30 (%)	Topic 31 (%)	Topic 32 (%)	Topic 33 (%)	Topic 34 (%)	Topic 35 (%)	Topic 36 (%)	Topic 37 (%)	Topic 38 (%)	Topic 39 (%)	Topic 40 (%)
1991	0.66	1.35	0.53	1.87	2.74	0.96	2.21	1.05	4.28	1.75	1.40	1.32	0.81	3.66	3.20	0.88	2.82	4.94	2.24	0.49
1992	2.66	1.08	1.18	0.47	2.12	1.78	2.02	1.62	3.38	2.61	0.63	1.05	1.39	3.59	6.53	0.77	2.27	7.07	3.67	0.46
1993	2.30	1.25	1.30	1.44	2.15	1.86	1.53	2.65	3.45	2.21	0.59	1.93	1.16	3.15	2.80	0.90	2.33	5.87	2.58	1.40
1994	2.14	1.07	1.87	0.79	2.00	2.06	1.70	2.49	3.08	3.89	0.92	1.92	1.17	3.04	3.57	1.19	2.40	6.53	2.93	0.70
1995	3.98	2.07	2.57	1.16	2.22	2.08	1.73	3.44	3.49	2.77	0.86	2.44	0.92	3.74	2.98	1.35	2.34	5.29	2.58	0.69
1996	2.87	2.08	0.99	1.09	3.61	1.32	1.48	2.49	4.72	2.59	0.68	1.52	0.85	4.18	3.89	1.27	2.08	5.65	2.69	0.75
1997	2.34	1.67	1.16	0.70	2.78	1.87	2.17	2.35	3.73	3.82	0.81	1.45	1.03	3.94	3.33	0.88	2.51	6.31	3.21	0.52
1998	2.28	2.87	1.84	2.06	2.17	1.15	2.00	3.06	2.71	2.42	0.56	2.00	0.81	3.29	3.56	1.66	3.43	4.02	2.11	1.48
1999	2.00	1.51	1.39	1.28	3.41	1.96	2.95	2.21	4.18	3.31	0.79	2.07	1.04	3.63	3.90	0.80	1.55	5.65	2.46	1.10
2000	1.52	1.87	1.32	1.47	2.24	1.74	2.33	2.37	4.55	3.20	0.99	1.52	0.92	4.08	4.92	1.08	2.41	5.90	2.45	1.18
2001	2.11	2.69	1.33	1.41	2.74	2.26	2.07	2.19	3.88	2.50	0.61	2.07	1.11	3.98	3.64	1.46	2.12	5.36	2.39	1.42
2002	2.53	2.80	2.23	2.22	2.76	1.84	1.97	2.49	3.98	1.95	0.44	1.86	0.79	4.26	4.30	1.64	1.85	4.70	2.25	2.25
2003	2.54	2.59	1.82	2.06	2.64	2.26	1.67	3.29	3.48	2.76	0.59	1.86	1.13	4.18	4.52	1.77	1.96	5.17	2.46	1.67
2004	3.20	2.69	1.65	2.16	2.52	2.24	1.94	3.30	3.09	1.62	1.16	1.77	1.10	3.92	3.42	1.50	2.75	4.19	3.20	2.27
2005	2.18	3.84	1.30	1.70	2.56	1.80	1.82	3.19	3.42	1.82	0.69	2.33	1.23	4.14	3.51	1.69	2.26	5.01	2.32	2.49
2006	2.87	3.46	1.26	1.92	2.32	2.31	1.96	3.12	3.49	1.73	0.77	1.82	0.93	3.98	3.01	1.34	2.22	5.12	3.04	2.43
2007	2.66	2.89	1.56	1.62	2.55	2.58	1.92	3.02	3.45	2.26	1.40	2.17	1.00	4.25	3.35	1.44	2.26	5.51	2.80	2.42
2008	2.07	3.02	1.48	1.92	2.88	2.63	2.20	2.90	3.37	2.24	0.86	2.14	0.95	4.39	4.13	1.21	1.87	4.87	2.27	2.73
2009	1.87	2.72	1.39	1.74	3.43	2.36	2.18	2.49	3.54	1.87	1.37	2.61	1.31	4.31	3.58	1.78	2.19	4.59	2.61	2.69
2010	1.70	3.04	1.53	2.39	3.11	2.42	2.40	2.50	3.40	1.81	1.32	2.33	1.25	4.63	3.52	1.58	2.66	4.24	3.11	2.74
2011	1.61	2.70	1.83	1.95	3.36	2.36	2.30	2.38	3.67	1.65	1.70	2.43	1.19	3.95	3.46	2.00	2.56	4.46	2.82	2.68
2012	1.75	2.55	2.10	1.97	3.21	2.07	2.16	2.72	3.07	1.91	1.39	2.87	1.64	4.59	3.36	1.71	2.49	3.94	2.88	2.76
2013	1.84	2.60	1.85	1.60	3.43	2.93	2.15	2.15	3.46	1.73	1.29	2.93	1.44	4.81	3.87	1.79	2.37	3.81	2.89	3.15
2014	1.99	2.65	1.50	1.71	3.35	2.60	1.76	3.06	3.22	1.83	1.70	2.65	1.29	4.62	3.35	1.89	2.69	4.22	2.64	2.29
2015	1.47	2.50	1.94	1.50	3.53	2.49	1.78	2.40	3.16	1.64	1.30	2.87	1.27	5.15	3.87	2.08	2.50	4.06	2.58	2.83
2016	1.91	2.76	1.77	1.52	3.11	2.97	2.19	2.89	3.02	1.78	1.09	2.90	1.48	4.38	3.20	2.00	2.67	3.73	2.63	3.34

Appendix IV. Topic Interlinkages

Fig. 14 shows the interlinkages between all 36 research topics in the order of their strength.

Fig. 14. Interlinkages between all 36 topics in the order of their strength.

Acknowledgments

The authors would like to acknowledge the support from the INViSiONLab. The second author is grateful to Ms. Lucy El-Sherif who first introduced him to the origin of Blue Ocean Strategy in 1999 and the authors are thankful for her feedback on the article.

References

Akilan, A. 2015. “Text mining: Challenges and future directions.” In Proc., 2nd Int. Conf. Electronics and Communication Systems, 1679–1684. Piscataway, NJ: IEEE.

Google Scholar

Al-Harthy, A. S., and D. M. Frangopol. 1994. “Reliability-based design of prestressed concrete beams.” J. Struct. Eng. 120 (11): 3156–3177. https://doi.org/10.1061/(ASCE)0733-9445(1994)120:11(3156).

Google Scholar

Amado, A., P. Cortez, P. Rita, and S. Moro. 2018. “Research trends on Big Data in Marketing: A text mining and topic modeling based literature analysis.” Eur. Res. Manage. Bus. Econ. 24 (1): 1–7.

Crossref

Google Scholar

Asuncion, H. U., A. U. Asuncion, and R. N. Taylor. 2010. “Software traceability with topic modeling.” In Vol. 1 of Proc., 32nd ACM/IEEE Int. Conf. on Software Engineering, 95–104. Piscataway, NJ: IEEE.

Google Scholar

Bart, E., M. Welling, and P. Perona. 2011. “Unsupervised organization of image collections: Taxonomies and beyond.” IEEE Trans. Pattern Anal. Mach. Intell. 33 (11): 2302–2315. https://doi.org/10.1109/TPAMI.2011.79.

Google Scholar

Blei, D. M. 2012. “Probabilistic topic models.” Commun. ACM 55 (4): 77–84. https://doi.org/10.1145/2133806.2133826.

Google Scholar

Blei, D. M., and J. D. Lafferty. 2006. “Dynamic topic models.” In Proc., 23rd Int. Conf. on Machine Learning, 113–120. New York: Association for Computing Machinery.

Google Scholar

Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. “Latent Dirichlet allocation.” J. Mach. Learn. Res. 3 (Jan): 993–1022.

Google Scholar

Bonstrom, H., and R. B. Corotis. 2014. “First-order reliability approach to quantify and improve building portfolio resilience.” J. Struct. Eng. 142 (8): C4014001. https://doi.org/10.1061/(ASCE)ST.1943-541X.0001213.

Google Scholar

Bruneau, M., M. Barbato, J. E. Padgett, A. E. Zaghi, J. Mitrani-Reiser, and Y. Li. 2017. “State of the art of multihazard design.” J. Struct. Eng. 143 (10): 03117002. https://doi.org/10.1061/(ASCE)ST.1943-541X.0001893.

Google Scholar

Chalmers, I., and P. Glasziou. 2009. “Avoidable waste in the production and reporting of research evidence.” Lancet 374 (9683): 86–89.

Crossref

Google Scholar

Cimellaro, G. P., C. Renschler, A. M. Reinhorn, and L. Arendt. 2016. “Peoples: A framework for evaluating resilience.” J. Struct. Eng. 142 (10): 04016063. https://doi.org/10.1061/(ASCE)ST.1943-541X.0001514.

Google Scholar

Correa, J. 2001. “Interval estimation of the parameters of the multinomial distribution.” Stat. Int. 1–9.

Google Scholar

El-Dakhakhni, W., and A. Ashour. 2017. “Seismic response of reinforced-concrete masonry shear-wall components and systems: State of the art.” J. Struct. Eng. 143 (9): 03117001. https://doi.org/10.1061/(ASCE)ST.1943-541X.0001840.

Google Scholar

El-Tawil, S. 2018. “Special collection on 60th anniversary state-of-the-art papers.” J. Struct. Eng. 144 (3). https://doi.org/10.1061/(ASCE)ST.1943-541X.0001998.

Google Scholar

Ezzeldin, M., and W. E. El-Dakhakhni. 2019. “Robustness of Ontario power network under systemic risks.” Sustainable Resilient Infrastruct. 1–20. https://doi.org/10.1080/23789689.2019.1666340.

Google Scholar

Feldman, R., and I. Dagan. 1995. “Knowledge discovery in textual databases (KDT).” In Vol. 95 of Proc., KDD, 112–117. Palo Alto, CA: Association for the Advancement of Artificial Intelligence.

Google Scholar

Fleuren, W. W., and W. Alkema. 2015. “Application of text mining in the biomedical domain.” Methods 74 (Mar): 97–106. https://doi.org/10.1016/j.ymeth.2015.01.015.

Google Scholar

Gatti, C. J., J. D. Brooks, and S. G. Nurre. 2015. “A historical analysis of the field of or/ms using topic models.” Preprint, submitted October 17, 2015. http://arxiv.org/abs/1510.05154.

Google Scholar

Geman, S., and D. Geman. 1984. “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.” IEEE Trans. Pattern Anal. Mach. Intell. 721–741. https://doi.org/10.1109/TPAMI.1984.4767596.

Google Scholar

Griffiths, T. L., and M. Steyvers. 2004. “Finding scientific topics.” Supplement, Proc. Natl. Acad. Sci. 101 (S1): 5228–5235. https://doi.org/10.1073/pnas.0307752101.

Google Scholar

Gupta, V., and G. S. Lehal. 2009. “A survey of text mining techniques and applications.” J. Emerging Technol. Web Intell. 1 (1): 60–76. https://doi.org/10.4304/jetwi.1.1.60-76.

Google Scholar

Hofmann, M., and A. Chisholm. 2016. Vol. 40 of Text mining and visualization: Case studies using open-source tools. Boca Raton, FL: CRC Press.

Crossref

Google Scholar

Hofmann, T. 1999. “Probabilistic latent semantic analysis.” In Proc., Fifteenth Conf. on Uncertainty in Artificial Intelligence, 289–296. San Francisco: Morgan Kaufmann Publishers.

Google Scholar

Holnicki-Szulc, J., and C. M. Soares. 2013. Vol. 1 of Advances in smart technologies in structural engineering. Berlin: Springer.

Google Scholar

Hong, L., and B. D. Davison. 2010. “Empirical study of topic modeling in twitter.” In Proc., 1st Workshop on Social Media Analytics, 80–88. New York: Association for Computing Machinery.

Google Scholar

Hospedales, T., S. Gong, and T. Xiang. 2009. “A Markov clustering topic model for mining behaviour in video.” In Proc., 12th Int. Conf. Computer Vision, 1165–1172. Piscataway, NJ: IEEE.

Google Scholar

Hosseini, M. R., I. Martek, E. K. Zavadskas, A. A. Aibinu, M. Arashpour, and N. Chileshe. 2018. “Critical evaluation of off-site construction research: A Scientometric analysis.” Autom. Constr. 87 (Mar): 235–247. https://doi.org/10.1016/j.autcon.2017.12.002.

Google Scholar

Hu, Y. 2005. “Efficient, high-quality force-directed graph drawing.” Math. J. 10 (1): 37–71.

Google Scholar

Hu, Y., J. Boyd-Graber, B. Satinoff, and A. Smith. 2014. “Interactive topic modeling.” Mach. Learn. 95 (3): 423–469. https://doi.org/10.1007/s10994-013-5413-0.

Google Scholar

Kim, W. C., and R. A. Mauborgne. 2014. Blue ocean strategy, expanded edition: How to create uncontested market space and make the competition irrelevant. Boston: Harvard Business Review Press.

Google Scholar

Krallinger, M., A. Morgan, L. Smith, F. Leitner, L. Tanabe, J. Wilbur, and A. Valencia. 2008. “The BioCreative II-critical assessment for information extraction in biology challenge.” Genome Biol. 9 (2). S10.

Google Scholar

Lazard, A. J., E. Scheinfeld, J. M. Bernhardt, G. B. Wilcox, and M. Suran. 2015. “Detecting themes of public concern: A text mining analysis of the Centers for Disease Control and Prevention’s Ebola live Twitter chat.” Am. J. Infect. Control 43 (10): 1109–1111. https://doi.org/10.1016/j.ajic.2015.05.025.

Google Scholar

Li, H. N., D. S. Li, and G. B. Song. 2004. “Recent applications of fiber optic sensors to health monitoring in civil engineering.” Eng. Struct. 26 (11): 1647–1657. https://doi.org/10.1016/j.engstruct.2004.05.018.

Google Scholar

Li, L. J., C. Wang, Y. Lim, D. M. Blei, and L. Fei-Fei. 2010. “Building and using a semantivisual image hierarchy.” In Proc., 2010 IEEE Conf. on Computer Vision and Pattern Recognition, 3336–3343. Piscataway, NJ: IEEE.

Google Scholar

Lounis, Z., and T. P. McAllister. 2016. “Risk-based decision making for sustainable and resilient infrastructure systems.” J. Struct. Eng. 142 (9): F4016005. https://doi.org/10.1061/(ASCE)ST.1943-541X.0001545.

Google Scholar

Maheswaran, S., R. D. Kumar, and K. R. Sridharan. 2009. “Scientometric analysis of area-wise publications in the field of structural engineering: A case study of SERC, India.” Ann. Lib. Inf. Stud. 56 (1): 22–28.

Google Scholar

Manning, C. D., and H. Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

Google Scholar

Manyika, J., M. Chui, J. Bughin, R. Dobbs, P. Bisson, and A. Marrs. 2013. Vol. 180 of Disruptive technologies: Advances that will transform life, business, and the global economy. San Francisco: McKinsey Global Institute.

Google Scholar

Minka, T. 2000. Estimating a Dirichlet distribution. Technical Rep., Cambridge, MA: Massachusetts Institute of Technology.

Google Scholar

Nassirtoussi, A. K., S. Aghabozorgi, T. Y. Wah, and D. C. L. Ngo. 2014. “Text mining for market prediction: A systematic review.” Expert Syst. Appl. 41 (16): 7653–7670. https://doi.org/10.1016/j.eswa.2014.06.009.

Google Scholar

Ordenes, F. V., B. Theodoulidis, J. Burton, T. Gruber, and M. Zaki. 2014. “Analyzing customer experience feedback using text mining: A linguistics-based approach.” J. Service Res. 17 (3): 278–295. https://doi.org/10.1177/1094670514524625.

Google Scholar

Pollack, J., and D. Adler. 2015. “Emergent trends and passing fads in project management research: A scientometric analysis of changes in the field.” Int. J. Project Manage. 33 (1): 236–248. https://doi.org/10.1016/j.ijproman.2014.04.011.

Google Scholar

Salem, S., M. Campidelli, W. W. El-Dakhakhni, and M. J. Tait. 2018. “Resilience-based design of urban centres: Application to blast risk assessment.” Sustainable Resilient Infrastruct. 3 (2): 68–85.

Crossref

Google Scholar

Salloum, S. A., M. Al-Emran, A. A. Monem, and K. Shaalan. 2018. “Using text mining techniques for extracting information from research articles.” In Intelligent natural language processing: trends and applications, 373–397. Cham, Switzerland: Springer.

Crossref

Google Scholar

Schnobrich, W. C. 1991. “Reflections on the behaviour of reinforced concrete shells.” Eng. Struct. 13 (2): 199–210. https://doi.org/10.1016/0141-0296(91)90051-D.

Google Scholar

Serenko, A., N. Bontis, L. Booker, K. Sadeddin, and T. Hardie. 2010. “A scientometric analysis of knowledge management and intellectual capital academic literature (1994-2008).” J. Knowl. Manage. 14 (1): 3–23. https://doi.org/10.1108/13673271011015534.

Google Scholar

Sohn, H., J. A. Czarnecki, and C. R. Farrar. 2000. “Structural health monitoring using statistical process control.” J. Struct. Eng. 126 (11): 1356–1363. https://doi.org/10.1061/(ASCE)0733-9445(2000)126:11(1356).

Google Scholar

Spencer, B. F., Jr., and S. Nagarajaiah. 2003. “State of the art of structural control.” J. Struct. Eng. 129 (7): 845–856. https://doi.org/10.1061/(ASCE)0733-9445(2003)129:7(845).

Google Scholar

Steyvers, M., and T. Griffiths. 2007. “Probabilistic topic models.” Handb. Latent Semant. Anal. 427 (7): 424–440.

Google Scholar

Sukanya, M., and S. Biruntha. 2012. “Techniques on text mining.” In Proc., IEEE Int. Conf. on Advanced Communication Control and Computing Technologies (ICACCCT), 269–271. Piscataway, NJ: IEEE.

Google Scholar

Tang, J., J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. 2008. “Arnetminer: Extraction and mining of academic social networks.” In Proc., 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 990–998. New York: Association for Computing Machinery.

Google Scholar

Wang, X., and A. McCallum. 2006. “Topics over time: A non-Markov continuous-time model of topical trends.” In Proc., 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 424–433. New York: Association for Computing Machinery.

Google Scholar

Wuchty, S., and E. Almaas. 2005. “Evolutionary cores of domain co-occurrence networks.” BMC Evol. Biol. 5 (1): 24. https://doi.org/10.1186/1471-2148-5-24.

Google Scholar

Zhang, Y., M. Chen, and L. Liu. 2015. “A review on text mining.” In Proc., 6th IEEE Int. Conf. on Software Engineering and Service Science (ICSESS), 681–685. Piscataway, NJ: IEEE.

Google Scholar

Zhao, X. L., and L. Zhang. 2007. “State-of-the-art review on FRP strengthened steel structures.” Eng. Struct. 29 (8): 1808–1823. https://doi.org/10.1016/j.engstruct.2006.10.006.

Google Scholar

Information & Authors

Information

Published In

Journal of Structural Engineering

Volume 146 • Issue 5 • May 2020

Copyright

This work is made available under the terms of the Creative Commons Attribution 4.0 International license, http://creativecommons.org/licenses/by/4.0/.

History

Received: Oct 30, 2018

Accepted: Jul 15, 2019

Published online: Feb 28, 2020

Published in print: May 1, 2020

Discussion open until: Jul 28, 2020

Authors

Affiliations

Mohamed Ezzeldin, A.M.ASCE [email protected]

Assistant Professor, Dept. of Civil Engineering, McMaster Univ., Hamilton, ON, Canada L8S 4L7. Email: [email protected]

View all articles by this author

Wael El-Dakhakhni, F.ASCE https://orcid.org/0000-0001-8617-261X [email protected]

Professor and Director of the INViSiONLab, Dept. of Civil Engineering, McMaster Univ., Hamilton, ON, Canada L8S 4L7 (corresponding author). ORCID: https://orcid.org/0000-0001-8617-261X. Email: [email protected]

View all articles by this author

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Abstract

Introduction

Objectives and Scope

Topic Modeling

Latent Dirichlet Allocation

Temporal Distribution of Topics

Text Mining

Data Collection and Preprocessing

Model Description

Topic Identifications

Overall Temporal Variation of Topics

Recent Temporal Variation of Topics

Knowledge Gap Discoveries

Word Co-occurrence Network

Topics Interlinkage Matrix

Blue Oceans

Conclusions

Notation

Appendix I. Word Cloud of Topics

Appendix II. Overall Topic Distribution from 1991 to 2016

Appendix III. Topic Distribution from 1991 to 2016 for Each Journal

Appendix IV. Topic Interlinkages

Acknowledgments

References

Information

Published In

Copyright

History

Authors

Affiliations

Metrics

Citations

Download citation

Cited by

Figures

Other

Share

Copy the content Link

Share with email

Share

Request Username

Create a new account

Change Password

Password Changed Successfully

Verify Phone

Congrats!