Technical Papers
Feb 3, 2023

Natural Language Processing–Driven Similar Project Determination Using Project Scope Statements

Publication: Journal of Management in Engineering
Volume 39, Issue 3

Abstract

Learning from previous projects in light of benchmarking criteria is a desirable and popular approach to reliable project development and planning in the preconstruction phase. Previous similar projects serve as a practical and proven source of knowledge that can be applicable to future projects. In the early preconstruction phase, the common practice of similar project determination involves leveraging simple and limited project characteristics, resulting in determination accuracy degradation. In order to deliver a project context-based similarity evaluation, this study develops and proposes a natural language processing (NLP)–driven method that can recommend similar previous projects by systematically measuring the similarity between project scope statements. NLP techniques enable systematic measurement of project scope similarity, which addresses the reliance on (1) individual experience and expertise for comprehending contents; and (2) time and efforts for reviewing all the unstructured descriptive narratives of project scopes. The proposed method extracts key work activities from project scope statements, evaluates the level of homogeneity (LOH) between extracted activities, and quantifies the project similarity based on the homogeneity evaluation results. The proposed method utilizes bidirectional encoder representations from transformers (BERT) models that can embed unstructured texts into computer-readable numeric formats by considering the context of texts. The output of the proposed model is a graphical map depicting similarities that can help project engineers quickly and intuitively recognize the similarity evaluation results. The validity test shows that the proposed method demonstrates better performance in determining the most highly similar past projects with an ongoing project. The proposed method is appropriate for enhancing an effective information acquisition process from previous projects, resulting in an improved and more efficient project planning process during the preconstruction phase.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request.

References

Adhikari, A., A. Ram, R. Tang, and J. Lin. 2019. “Docbert: Bert for document classification.” Preprint, submitted April 17, 2019. https://arxiv.org/abs/1904.08398.
Al Qady, M., and A. Kandil. 2014. “Automatic clustering of construction project documents based on textual similarity.” Autom. Constr. 42 (Jun): 36–49. https://doi.org/10.1016/j.autcon.2014.02.006.
Al-Shehari, T., and R. A. Alsowail. 2021. “An Insider data leakage detection using one-hot encoding, synthetic minority oversampling and machine learning techniques.” Entropy 23 (10): 1258. https://doi.org/10.3390/e23101258.
Anglin, K. L., V. C. Wong, and A. Boguslav. 2021. “A natural language processing approach to measuring treatment adherence and consistency using semantic similarity.” AERA Open 7 (1): 233285842110286. https://doi.org/10.1177/23328584211028615.
Bahdanau, D., K. Cho, and Y. Bengio. 2014. “Neural machine translation by jointly learning to align and translate.” Preprint, submitted September 1, 2014. https://arxiv.org/abs/1409.0473.
Bahel, V., and A. Thomas. 2021. “Text similarity analysis for evaluation of descriptive answers.” Preprint, submitted May 6, 2021. https://arxiv.org/abs/2105.02935.
Beach, T. H., J.-L. Hippolyte, and Y. Rezgui. 2020. “Towards the adoption of automated regulatory compliance checking in the built environment.” Autom. Constr. 118 (Oct): 103285. https://doi.org/10.1016/j.autcon.2020.103285.
Chen, K., R. J. Mahfoud, Y. Sun, D. Nan, K. Wang, H. Haes Alhelou, and P. Siano. 2020. “Defect texts mining of secondary device in smart substation with GloVe and attention-based bidirectional LSTM.” Energies 13 (17): 4522. https://doi.org/10.3390/en13174522.
Cheng, J. C., and L. J. Ma. 2015. “A non-linear case-based reasoning approach for retrieval of similar cases and selection of target credits in LEED projects.” Build. Environ. 93 (Part 2): 349–361. https://doi.org/10.1016/j.buildenv.2015.07.019.
Cho, C.-S., and G. E. Gibson Jr. 2001. “Building project scope definition using project definition rating index.” J. Archit. Eng. 7 (4): 115–125. https://doi.org/10.1061/(ASCE)1076-0431(2001)7:4(115).
Choi, J., D. P. de Oliveira, and F. Leite. 2022. “A novel approach to capture similarity in capital project benchmarking: An application to healthcare projects.” J. Manage. Eng. 38 (3): 05022007. https://doi.org/10.1061/(ASCE)ME.1943-5479.0001039.
Church, K. W. 2017. “Word2Vec.” Nat. Lang. Eng. 23 (1): 155–162. https://doi.org/10.1017/S1351324916000334.
Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. “Bert: Pre-training of deep bidirectional transformers for language understanding.” Preprint, submitted October 11, 2018. https://arxiv.org/abs/1810.04805.
Du, J., and J. Bormann. 2014. “Improved similarity measure in case-based reasoning with global sensitivity analysis: An example of construction quantity estimating.” J. Comput. Civ. Eng. 28 (6): 04014020. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000267.
Eken, G., G. Bilgin, I. Dikmen, and M. T. Birgonul. 2015. “A lessons learned database structure for construction companies.” Procedia Eng. 123 (Oct): 135–144. https://doi.org/10.1016/j.proeng.2015.10.070.
Ethayarajh, K. 2019. “How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings.” Preprint, submitted September 2, 2019. https://arxiv.org/abs/1909.00512.
Haponava, T., and S. Al-Jibouri. 2009. “Identifying key performance indicators for use in control of pre-project stage process in construction.” Int. J. Prod. Perform. Manage. 58 (2): 160–173. https://doi/10.1108/17410400910928743.
Hastak, M., and C. Koo. 2017. “Theory of an intelligent planning unit for the complex built environment.” J. Manage. Eng. 33 (3): 04016046. https://doi.org/10.1061/(ASCE)ME.1943-5479.0000486.
Howard, J., and S. Ruder. 2018. “Universal language model fine-tuning for text classification.” Preprint, submitted January 18, 2018. https://arxiv.org/abs/1801.06146.
Hu, X., B. Xia, M. Skitmore, and Q. Chen. 2016. “The application of case-based reasoning in construction management research: An overview.” Autom. Constr. 72 (Part 2): 65–74. https://doi.org/10.1016/j.autcon.2016.08.023.
Kiziltas, S., and B. Akinci. 2009. “Contextual information requirements of cost estimators from past construction projects.” J. Constr. Eng. Manage. 135 (9): 841–852. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000053.
Ko, T., H. D. Jeong, and G. Lee. 2021. “Natural language processing–driven model to extract contract change reasons and altered work items for advanced retrieval of change orders.” J. Constr. Eng. Manage. 147 (11): 04021147. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002172.
Koo, C., T. Hong, and C. Hyun. 2011. “The development of a construction cost prediction model with improved prediction capacity using the advanced CBR approach.” Expert Syst. Appl. 38 (7): 8597–8606. https://doi.org/10.1016/j.eswa.2011.01.063.
Koo, C., T. Hong, C. Hyun, and K. Koo. 2010. “A CBR-based hybrid model for predicting a construction duration and cost based on project characteristics in multi-family housing projects.” Can. J. Civ. Eng. 37 (5): 739–752. https://doi.org/10.1139/L10-007.
Lahitani, A. R., A. E. Permanasari, and N. A. Setiawan. 2016. “Cosine similarity to determine similarity measure: Study case in online essay assessment.” In Proc., 4th Int. Conf. on Cyber and IT Service Management, 1–6. New York: IEEE.
Le, C., T. Ko, and H. D. Jeong. 2022. “A natural language processing-based approach for clustering construction projects.” In Proc., Construction Research Congress 2022, 354–360. Reston, VA: ASCE.
Lee, J., Y. Ham, J.-S. Yi, and J. Son. 2020. “Effective risk positioning through automated identification of missing contract conditions from the contractor’s perspective based on FIDIC contract cases.” J. Manage. Eng. 36 (3): 05020003. https://doi.org/10.1061/(ASCE)ME.1943-5479.0000757.
Lee, J., J.-S. Yi, and J. Son. 2019. “Development of automatic-extraction model of poisonous clauses in international construction contracts using rule-based NLP.” J. Comput. Civ. Eng. 33 (3): 04019003. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000807.
Leśniak, A., and K. Zima. 2018. “Cost calculation of construction projects including sustainability factors using the Case Based Reasoning (CBR) method.” Sustainability 10 (5): 1608. https://doi.org/10.3390/su10051608.
Liu, N., B. Zhang, J. Yan, Z. Chen, W. Liu, F. Bai, and L. Chien. 2005. “Text representation: From vector to tensor.” In Proc., 5th IEEE Int. Conf. on Data Mining (ICDM’05), 4. New York: IEEE.
Maheshwari, G., P. Trivedi, H. Sahijwani, K. Jha, S. Dasgupta, and J. Lehmann. 2017. “Simdoc: Topic sequence alignment based document similarity framework.” In Proc., Knowledge Capture Conf., 1–8. Beijing: SIGAI.
Mosbach, M., M. Andriushchenko, and D. Klakow. 2020. “On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines.” Preprint, submitted June 8, 2020. https://arxiv.org/abs/006.04884.
Mueller, J., and A. Thyagarajan. 2016. “Siamese recurrent architectures for learning sentence similarity.” In Vol. 30 of Proc., AAAI Conf. on Artificial Intelligence. Palo Alto, CA: Association for the Advancement of Artificial Intelligence.
Ozorhon, B., C. G. Karatas, and S. Demirkesen. 2014. “A web-based database system for managing construction project knowledge.” Procedia-Social Behav. Sci. 119 (Mar): 377–386. https://doi.org/10.1016/j.sbspro.2014.03.043.
Putra, J. W. G., and T. Tokunaga. 2017. “Evaluating text coherence based on semantic similarity graph.” In Proc., TextGraphs-11: The Workshop on Graph-based Methods for Natural Language Processing, 76–85. Vancouver, BC, Canada: Association for Computational Linguistics.
Qaiser, S., and R. Ali. 2018. “Text mining: Use of TF-IDF to examine the relevance of words to documents.” Int. J. Comput. Appl. 181 (1): 25–29. https://doi.org/10.5120/ijca2018917395.
Qiao, Y., J. D. Fricker, and S. Labi. 2019. “Quantifying the similarity between different project types based on their pay item compositions: Application to bundling.” J. Constr. Eng. Manage. 145 (9): 04019053. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001689.
Sears, S. K., G. A. Sears, R. H. Clough, J. L. Rounds, and R. O. Segner. 2015. Construction project management. Hoboken, NJ: John Wiley & Sons.
Sharma, Y., G. Agrawal, P. Jain, and T. Kumar. 2017. “Vector representation of words for sentiment analysis using GloVe.” In Proc., 2017 Int. Conf. on Intelligent Communication and Computational Techniques (ICCT), 279–284. New York: IEEE.
Singh, S., and T. J. Siddiqui. 2012. “Evaluating effect of context window size, stemming and stop word removal on Hindi word sense disambiguation.” In Proc., 2012 Int. Conf. on Information Retrieval & Knowledge Management, 1–5. New York: IEEE.
Sun, C., X. Qiu, Y. Xu, and X. Huang. 2019. “How to fine-tune BERT for text classification?” In Proc., China National Conf. on Chinese Computational Linguistics, 194–206. Cham, Switzerland: Springer.
Szafranko, E., and P. Srokosz. 2019. “Applicability of the theory of similarity in an evaluation of building development variants.” Autom. Constr. 104 (Aug): 322–330. https://doi.org/10.1016/j.autcon.2019.04.010.
Torkanfar, N., and E. R. Azar. 2020. “Quantitative similarity assessment of construction projects using WBS-based metrics.” Adv. Eng. Inf. 46 (Oct): 101179. https://doi.org/10.1016/j.aei.2020.101179.
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. “Attention is all you need.” Preprint, submitted June 12, 2017. https://arxiv.org/abs/1706.03762.
Wu, L., W. Ji, B. Feng, U. Hermann, and S. AbouRizk. 2021. “Intelligent data-driven approach for enhancing preliminary resource planning in industrial construction.” Autom. Constr. 130 (Oct): 103846. https://doi.org/10.1016/j.autcon.2021.103846.
Xu, X., and H. Cai. 2019. “Semantic frame-based information extraction from utility regulatory documents to support compliance checking.” In Advances in informatics and computing in civil and construction engineering, 223–230. Cham, Switzerland: Springer.
Yu, M.-L., and M.-H. Tsai. 2021. “ACS: Construction data auto-correction system—Taiwan Public construction data example.” Sustainability 13 (1): 362. https://doi.org/10.3390/su13010362.
Zhang, J., and N. M. El-Gohary. 2015. “Automated information transformation for automated regulatory compliance checking in construction.” J. Comput. Civ. Eng. 29 (4): B4015001. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000427.
Zhang, J., and N. M. El-Gohary. 2016. “Semantic NLP-based information extraction from construction regulatory documents for automated compliance checking.” J. Comput. Civ. Eng. 30 (2): 04015014. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000346.
Zhang, T., F. Wu, A. Katiyar, K. Q. Weinberger, and Y. Artzi. 2020. “Revisiting few-sample BERT fine-tuning.” Preprint, submitted June 10, 2020. http://arxiv.org/abs/2006.05987.
Zhao, R., and K. Mao. 2017. “Fuzzy bag-of-words model for document representation.” IEEE Trans. Fuzzy Syst. 26 (2): 794–804. https://doi.org/10.1109/TFUZZ.2017.2690222.
Zou, Y., A. Kiviniemi, and S. W. Jones. 2017. “Retrieving similar cases for construction project risk management using natural language processing techniques.” Autom. Constr. 80 (Aug): 66–76. https://doi.org/10.1016/j.autcon.2017.04.003.
Zozaya-Gorostiza, C. 2012. Knowledge-based process planning for construction and manufacturing. Amsterdam, Netherlands: Elsevier.

Information & Authors

Information

Published In

Go to Journal of Management in Engineering
Journal of Management in Engineering
Volume 39Issue 3May 2023

History

Received: Aug 17, 2022
Accepted: Dec 16, 2022
Published online: Feb 3, 2023
Published in print: May 1, 2023
Discussion open until: Jul 3, 2023

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Taewoo Ko, Ph.D. [email protected]
Postdoctoral Researcher, Dept. of Civil and Environmental Engineering and Construction, Univ. of Nevada, Las Vegas, Las Vegas, NV 89154. Email: [email protected]
Professor, Dept. of Construction Science, Texas A&M Univ., College Station, TX 77843. ORCID: https://orcid.org/0000-0003-4074-1869. Email: [email protected]
Assistant Professor, Dept. of Civil and Environmental Engineering and Construction, Univ. of Nevada, Las Vegas, Las Vegas, NV 89154 (corresponding author). ORCID: https://orcid.org/0000-0002-5944-3848. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

  • Pro-Active Allocation of Project Requirements through Natural Language Processing (NLP) and Project Information System Integration, Construction Research Congress 2024, 10.1061/9780784485262.133, (1308-1316), (2024).

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share