Abstract
Learning from previous projects in light of benchmarking criteria is a desirable and popular approach to reliable project development and planning in the preconstruction phase. Previous similar projects serve as a practical and proven source of knowledge that can be applicable to future projects. In the early preconstruction phase, the common practice of similar project determination involves leveraging simple and limited project characteristics, resulting in determination accuracy degradation. In order to deliver a project context-based similarity evaluation, this study develops and proposes a natural language processing (NLP)–driven method that can recommend similar previous projects by systematically measuring the similarity between project scope statements. NLP techniques enable systematic measurement of project scope similarity, which addresses the reliance on (1) individual experience and expertise for comprehending contents; and (2) time and efforts for reviewing all the unstructured descriptive narratives of project scopes. The proposed method extracts key work activities from project scope statements, evaluates the level of homogeneity (LOH) between extracted activities, and quantifies the project similarity based on the homogeneity evaluation results. The proposed method utilizes bidirectional encoder representations from transformers (BERT) models that can embed unstructured texts into computer-readable numeric formats by considering the context of texts. The output of the proposed model is a graphical map depicting similarities that can help project engineers quickly and intuitively recognize the similarity evaluation results. The validity test shows that the proposed method demonstrates better performance in determining the most highly similar past projects with an ongoing project. The proposed method is appropriate for enhancing an effective information acquisition process from previous projects, resulting in an improved and more efficient project planning process during the preconstruction phase.
Get full access to this article
View all available purchase options and get full access to this article.
Data Availability Statement
Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request.
References
Adhikari, A., A. Ram, R. Tang, and J. Lin. 2019. “Docbert: Bert for document classification.” Preprint, submitted April 17, 2019. https://arxiv.org/abs/1904.08398.
Al Qady, M., and A. Kandil. 2014. “Automatic clustering of construction project documents based on textual similarity.” Autom. Constr. 42 (Jun): 36–49. https://doi.org/10.1016/j.autcon.2014.02.006.
Al-Shehari, T., and R. A. Alsowail. 2021. “An Insider data leakage detection using one-hot encoding, synthetic minority oversampling and machine learning techniques.” Entropy 23 (10): 1258. https://doi.org/10.3390/e23101258.
Anglin, K. L., V. C. Wong, and A. Boguslav. 2021. “A natural language processing approach to measuring treatment adherence and consistency using semantic similarity.” AERA Open 7 (1): 233285842110286. https://doi.org/10.1177/23328584211028615.
Bahdanau, D., K. Cho, and Y. Bengio. 2014. “Neural machine translation by jointly learning to align and translate.” Preprint, submitted September 1, 2014. https://arxiv.org/abs/1409.0473.
Bahel, V., and A. Thomas. 2021. “Text similarity analysis for evaluation of descriptive answers.” Preprint, submitted May 6, 2021. https://arxiv.org/abs/2105.02935.
Beach, T. H., J.-L. Hippolyte, and Y. Rezgui. 2020. “Towards the adoption of automated regulatory compliance checking in the built environment.” Autom. Constr. 118 (Oct): 103285. https://doi.org/10.1016/j.autcon.2020.103285.
Chen, K., R. J. Mahfoud, Y. Sun, D. Nan, K. Wang, H. Haes Alhelou, and P. Siano. 2020. “Defect texts mining of secondary device in smart substation with GloVe and attention-based bidirectional LSTM.” Energies 13 (17): 4522. https://doi.org/10.3390/en13174522.
Cheng, J. C., and L. J. Ma. 2015. “A non-linear case-based reasoning approach for retrieval of similar cases and selection of target credits in LEED projects.” Build. Environ. 93 (Part 2): 349–361. https://doi.org/10.1016/j.buildenv.2015.07.019.
Cho, C.-S., and G. E. Gibson Jr. 2001. “Building project scope definition using project definition rating index.” J. Archit. Eng. 7 (4): 115–125. https://doi.org/10.1061/(ASCE)1076-0431(2001)7:4(115).
Choi, J., D. P. de Oliveira, and F. Leite. 2022. “A novel approach to capture similarity in capital project benchmarking: An application to healthcare projects.” J. Manage. Eng. 38 (3): 05022007. https://doi.org/10.1061/(ASCE)ME.1943-5479.0001039.
Church, K. W. 2017. “Word2Vec.” Nat. Lang. Eng. 23 (1): 155–162. https://doi.org/10.1017/S1351324916000334.
Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. “Bert: Pre-training of deep bidirectional transformers for language understanding.” Preprint, submitted October 11, 2018. https://arxiv.org/abs/1810.04805.
Du, J., and J. Bormann. 2014. “Improved similarity measure in case-based reasoning with global sensitivity analysis: An example of construction quantity estimating.” J. Comput. Civ. Eng. 28 (6): 04014020. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000267.
Eken, G., G. Bilgin, I. Dikmen, and M. T. Birgonul. 2015. “A lessons learned database structure for construction companies.” Procedia Eng. 123 (Oct): 135–144. https://doi.org/10.1016/j.proeng.2015.10.070.
Ethayarajh, K. 2019. “How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings.” Preprint, submitted September 2, 2019. https://arxiv.org/abs/1909.00512.
Haponava, T., and S. Al-Jibouri. 2009. “Identifying key performance indicators for use in control of pre-project stage process in construction.” Int. J. Prod. Perform. Manage. 58 (2): 160–173. https://doi/10.1108/17410400910928743.
Hastak, M., and C. Koo. 2017. “Theory of an intelligent planning unit for the complex built environment.” J. Manage. Eng. 33 (3): 04016046. https://doi.org/10.1061/(ASCE)ME.1943-5479.0000486.
Howard, J., and S. Ruder. 2018. “Universal language model fine-tuning for text classification.” Preprint, submitted January 18, 2018. https://arxiv.org/abs/1801.06146.
Hu, X., B. Xia, M. Skitmore, and Q. Chen. 2016. “The application of case-based reasoning in construction management research: An overview.” Autom. Constr. 72 (Part 2): 65–74. https://doi.org/10.1016/j.autcon.2016.08.023.
Kiziltas, S., and B. Akinci. 2009. “Contextual information requirements of cost estimators from past construction projects.” J. Constr. Eng. Manage. 135 (9): 841–852. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000053.
Ko, T., H. D. Jeong, and G. Lee. 2021. “Natural language processing–driven model to extract contract change reasons and altered work items for advanced retrieval of change orders.” J. Constr. Eng. Manage. 147 (11): 04021147. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002172.
Koo, C., T. Hong, and C. Hyun. 2011. “The development of a construction cost prediction model with improved prediction capacity using the advanced CBR approach.” Expert Syst. Appl. 38 (7): 8597–8606. https://doi.org/10.1016/j.eswa.2011.01.063.
Koo, C., T. Hong, C. Hyun, and K. Koo. 2010. “A CBR-based hybrid model for predicting a construction duration and cost based on project characteristics in multi-family housing projects.” Can. J. Civ. Eng. 37 (5): 739–752. https://doi.org/10.1139/L10-007.
Lahitani, A. R., A. E. Permanasari, and N. A. Setiawan. 2016. “Cosine similarity to determine similarity measure: Study case in online essay assessment.” In Proc., 4th Int. Conf. on Cyber and IT Service Management, 1–6. New York: IEEE.
Le, C., T. Ko, and H. D. Jeong. 2022. “A natural language processing-based approach for clustering construction projects.” In Proc., Construction Research Congress 2022, 354–360. Reston, VA: ASCE.
Lee, J., Y. Ham, J.-S. Yi, and J. Son. 2020. “Effective risk positioning through automated identification of missing contract conditions from the contractor’s perspective based on FIDIC contract cases.” J. Manage. Eng. 36 (3): 05020003. https://doi.org/10.1061/(ASCE)ME.1943-5479.0000757.
Lee, J., J.-S. Yi, and J. Son. 2019. “Development of automatic-extraction model of poisonous clauses in international construction contracts using rule-based NLP.” J. Comput. Civ. Eng. 33 (3): 04019003. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000807.
Leśniak, A., and K. Zima. 2018. “Cost calculation of construction projects including sustainability factors using the Case Based Reasoning (CBR) method.” Sustainability 10 (5): 1608. https://doi.org/10.3390/su10051608.
Liu, N., B. Zhang, J. Yan, Z. Chen, W. Liu, F. Bai, and L. Chien. 2005. “Text representation: From vector to tensor.” In Proc., 5th IEEE Int. Conf. on Data Mining (ICDM’05), 4. New York: IEEE.
Maheshwari, G., P. Trivedi, H. Sahijwani, K. Jha, S. Dasgupta, and J. Lehmann. 2017. “Simdoc: Topic sequence alignment based document similarity framework.” In Proc., Knowledge Capture Conf., 1–8. Beijing: SIGAI.
Mosbach, M., M. Andriushchenko, and D. Klakow. 2020. “On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines.” Preprint, submitted June 8, 2020. https://arxiv.org/abs/006.04884.
Mueller, J., and A. Thyagarajan. 2016. “Siamese recurrent architectures for learning sentence similarity.” In Vol. 30 of Proc., AAAI Conf. on Artificial Intelligence. Palo Alto, CA: Association for the Advancement of Artificial Intelligence.
Ozorhon, B., C. G. Karatas, and S. Demirkesen. 2014. “A web-based database system for managing construction project knowledge.” Procedia-Social Behav. Sci. 119 (Mar): 377–386. https://doi.org/10.1016/j.sbspro.2014.03.043.
Putra, J. W. G., and T. Tokunaga. 2017. “Evaluating text coherence based on semantic similarity graph.” In Proc., TextGraphs-11: The Workshop on Graph-based Methods for Natural Language Processing, 76–85. Vancouver, BC, Canada: Association for Computational Linguistics.
Qaiser, S., and R. Ali. 2018. “Text mining: Use of TF-IDF to examine the relevance of words to documents.” Int. J. Comput. Appl. 181 (1): 25–29. https://doi.org/10.5120/ijca2018917395.
Qiao, Y., J. D. Fricker, and S. Labi. 2019. “Quantifying the similarity between different project types based on their pay item compositions: Application to bundling.” J. Constr. Eng. Manage. 145 (9): 04019053. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001689.
Sears, S. K., G. A. Sears, R. H. Clough, J. L. Rounds, and R. O. Segner. 2015. Construction project management. Hoboken, NJ: John Wiley & Sons.
Sharma, Y., G. Agrawal, P. Jain, and T. Kumar. 2017. “Vector representation of words for sentiment analysis using GloVe.” In Proc., 2017 Int. Conf. on Intelligent Communication and Computational Techniques (ICCT), 279–284. New York: IEEE.
Singh, S., and T. J. Siddiqui. 2012. “Evaluating effect of context window size, stemming and stop word removal on Hindi word sense disambiguation.” In Proc., 2012 Int. Conf. on Information Retrieval & Knowledge Management, 1–5. New York: IEEE.
Sun, C., X. Qiu, Y. Xu, and X. Huang. 2019. “How to fine-tune BERT for text classification?” In Proc., China National Conf. on Chinese Computational Linguistics, 194–206. Cham, Switzerland: Springer.
Szafranko, E., and P. Srokosz. 2019. “Applicability of the theory of similarity in an evaluation of building development variants.” Autom. Constr. 104 (Aug): 322–330. https://doi.org/10.1016/j.autcon.2019.04.010.
Torkanfar, N., and E. R. Azar. 2020. “Quantitative similarity assessment of construction projects using WBS-based metrics.” Adv. Eng. Inf. 46 (Oct): 101179. https://doi.org/10.1016/j.aei.2020.101179.
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. “Attention is all you need.” Preprint, submitted June 12, 2017. https://arxiv.org/abs/1706.03762.
Wu, L., W. Ji, B. Feng, U. Hermann, and S. AbouRizk. 2021. “Intelligent data-driven approach for enhancing preliminary resource planning in industrial construction.” Autom. Constr. 130 (Oct): 103846. https://doi.org/10.1016/j.autcon.2021.103846.
Xu, X., and H. Cai. 2019. “Semantic frame-based information extraction from utility regulatory documents to support compliance checking.” In Advances in informatics and computing in civil and construction engineering, 223–230. Cham, Switzerland: Springer.
Yu, M.-L., and M.-H. Tsai. 2021. “ACS: Construction data auto-correction system—Taiwan Public construction data example.” Sustainability 13 (1): 362. https://doi.org/10.3390/su13010362.
Zhang, J., and N. M. El-Gohary. 2015. “Automated information transformation for automated regulatory compliance checking in construction.” J. Comput. Civ. Eng. 29 (4): B4015001. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000427.
Zhang, J., and N. M. El-Gohary. 2016. “Semantic NLP-based information extraction from construction regulatory documents for automated compliance checking.” J. Comput. Civ. Eng. 30 (2): 04015014. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000346.
Zhang, T., F. Wu, A. Katiyar, K. Q. Weinberger, and Y. Artzi. 2020. “Revisiting few-sample BERT fine-tuning.” Preprint, submitted June 10, 2020. http://arxiv.org/abs/2006.05987.
Zhao, R., and K. Mao. 2017. “Fuzzy bag-of-words model for document representation.” IEEE Trans. Fuzzy Syst. 26 (2): 794–804. https://doi.org/10.1109/TFUZZ.2017.2690222.
Zou, Y., A. Kiviniemi, and S. W. Jones. 2017. “Retrieving similar cases for construction project risk management using natural language processing techniques.” Autom. Constr. 80 (Aug): 66–76. https://doi.org/10.1016/j.autcon.2017.04.003.
Zozaya-Gorostiza, C. 2012. Knowledge-based process planning for construction and manufacturing. Amsterdam, Netherlands: Elsevier.
Information & Authors
Information
Published In
Copyright
© 2023 American Society of Civil Engineers.
History
Received: Aug 17, 2022
Accepted: Dec 16, 2022
Published online: Feb 3, 2023
Published in print: May 1, 2023
Discussion open until: Jul 3, 2023
ASCE Technical Topics:
- Benchmark
- Business management
- Chemical degradation
- Chemical processes
- Chemistry
- Computer models
- Construction engineering
- Construction management
- Engineering fundamentals
- Environmental engineering
- Graphic methods
- Homogeneity
- Management methods
- Material mechanics
- Material properties
- Materials engineering
- Methodology (by type)
- Models (by type)
- Numerical methods
- Numerical models
- Practice and Profession
- Project management
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.
Cited by
- Taewoo Ko, Rabin Shrestha, JeeHee Lee, Pro-Active Allocation of Project Requirements through Natural Language Processing (NLP) and Project Information System Integration, Construction Research Congress 2024, 10.1061/9780784485262.133, (1308-1316), (2024).