Natural Language Processing Application in Construction Domain: An Integrative Review and Algorithms Comparison
Publication: Computing in Civil Engineering 2021
ABSTRACT
As one of the main sources of information in construction projects, analyzing and exploiting text data emerged as a fast-growing body of literature in the construction domain recently. Various applications appear in a wide range of domains, specifically exploration of different natural language processing (NLP) techniques. Usage ranges from construction contracts, design requirements, risk registers, change orders, claims and litigation documents, and safety reports. In this paper, the authors present a systematic review of the current NLP body of knowledge in the construction research domain. Various machine learning and deep learning-based NLP techniques, as well as their applications in construction research, are documented. Further, the authors introduce potential knowledge gaps and future research directions. In particular, this paper compares the performance of these NLP techniques through a risk classification problem. Analysis reports that Bidirectional Encoder Representations from Transformers (BERT) model outperforms other NLP models and achieves 80% of accuracy in the risk classification task.
Get full access to this chapter
View all available purchase options and get full access to this chapter.
REFERENCES
Aragao, R., and El-Diraby, T. E. (2021). Network analytics and social BIM for managing project unstructured data. Automation in Construction, 122, p.103512.
Baker, H., Hallowell, M. R., and Tixier, A. J. P. (2020). Automatically learning construction injury precursors from text. Automation in Construction, 118, 103145.
Bilal, M., Oyedele, L. O., Qadir, J., Munir, K., Ajayi, S. O., Akinade, O. O., Owolabi, H. A., Alaka, H. A., and Pasha, M. (2016). Big Data in the construction industry: A review of present status, opportunities, and future trends. Advanced engineering informatics, 30(3), pp.500–521.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding.
Erfani, A., and Tavakolan, M. (2020). Risk Evaluation Model of Wind Energy Investment Projects Using Modified Fuzzy Group Decision-making and Monte Carlo Simulation. Arthaniti: Journal of Economic Theory and Practice, 0976747920963222.
Fang, W., Luo, H., Xu, S., Love, P. E., Lu, Z., and Ye, C. (2020). Automated text classification of near-misses from safety reports: An improved deep learning approach. Advanced Engineering Informatics, 44, 101060.
Hassan, F. U., and Le, T. (2020). Automated Requirements Identification from Construction Contract Documents Using Natural Language Processing. Journal of Legal Affairs and Dispute Resolution in Engineering and Construction, 12(2), 04520009.
Jallan, Y., and Ashuri, B. (2020). Text Mining of the Securities and Exchange Commission Financial Filings of Publicly Traded Construction Firms Using Deep Learning to Identify and Assess Risk. Journal of Construction Engineering and Management, 146(12), 04020137.
Jallan, Y., Brogan, E., Ashuri, B., and Clevenger, C. M. (2019). Application of Natural Language Processing and Text Mining to Identify Patterns in Construction-Defect Litigation Cases. Journal of Legal Affairs and Dispute Resolution in Engineering and Construction, 11(4), 04519024.
Jung, N., and Lee, G. (2019). Automated classification of building information modeling (BIM) case studies by BIM use based on natural language processing (NLP) and unsupervised learning. Advanced Engineering Informatics, 41, p.100917.
Kim, T., and Chi, S. (2019). Accident case retrieval and analyses: Using natural language processing in the construction industry. Journal of Construction Engineering and Management, 145(3), 04019004.
Le, T., and David Jeong, H. (2017). NLP-based approach to semantic classification of heterogeneous transportation asset data terminology. Journal of Computing in Civil Engineering, 31(6), 04017057.
Li, L., Ma, Z., and Cao, T. (2020). Leveraging social media data to study the community resilience of New York City to 2019 power outage. International Journal of Disaster Risk Reduction, 51, p.101776.
Mahfouz, T., Kandil, A., and Davlyatov, S. (2018). Identification of latent legal knowledge in differing site condition (DSC) litigations. Automation in Construction, 94, 104–111.
Marzouk, M., and Enaba, M. (2019). Text analytics to analyze and monitor construction project contract and correspondence. Automation in Construction, 98, 265–274.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality.
Moon, S., Lee, G., Chi, S., and Oh, H. (2019). Automatic review of construction specifications using natural language processing. In Computing in Civil Engineering 2019: Data, Sensing, and Analytics (pp. 401–407). Reston, VA: American Society of Civil Engineers.
Pennington, J., Socher, R., and Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Ramos, J. (2003, December). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (Vol. 242, No. 1, pp. 29–48).
Sun, S., Luo, C., and Chen, J. (2017). A review of natural language processing techniques for opinion mining systems. Information fusion, 36, pp.10–25.
Tang, L., Zhang, Y., Dai, F., Yoon, Y., Song, Y., and Sharma, R. S. (2017). Social media data analytics for the US Construction industry: Preliminary study on twitter. Journal of Management in Engineering, 33(6), 04017038.
Tixier, A. J. P., Hallowell, M. R., Rajagopalan, B., and Bowman, D. (2016). Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports. Automation in Construction, 62, 45–56.
Xu, X., and Cai, H. (2020). Semantic approach to compliance checking of underground utilities. Automation in Construction, 109, p.103006.
Xu, X., Jeon, J., Zhang, Y., Yang, L., and Cai, H. (2021). Automatic Generation of Customized Checklists for Digital Construction Inspection. Transportation Research Record, 0361198121995825.
Xue, J., Shen, G. Q., Li, Y., Wang, J., and Zafar, I. (2020). Dynamic Stakeholder-Associated Topic Modeling on Public Concerns in Megainfrastructure Projects: Case of Hong Kong–Zhuhai–Macao Bridge. Journal of Management in Engineering, 36(6), 04020078.
Xue, X., and Zhang, J. (2021). Part-of-speech tagging of building codes empowered by deep learning and transformational rules. Advanced Engineering Informatics, 47, p.101235.
Yan, H., Yang, N., Peng, Y., and Ren, Y. (2020). Data mining in the construction industry: Present status, opportunities, and future trends. Automation in Construction, 119, p.103331.
Zhang, J., and El-Gohary, N. M. (2017). Integrating semantic NLP and logic reasoning into a unified system for fully-automated code checking. Automation in construction, 73, pp.45–57.
Zheng, J., Wen, Q., and Qiang, M. (2020). Understanding Demand for Project Manager Competences in the Construction Industry: Data Mining Approach. Journal of Construction Engineering and Management, 146(8), 04020083.
Zhou, S., Ng, S. T., Yang, Y., and Xu, J. F. (2020). Delineating Infrastructure Failure Interdependencies and Associated Stakeholders through News Mining: The Case of Hong Kong’s Water Pipe Bursts. Journal of Management in Engineering, 36(5), 04020060.
Information & Authors
Information
Published In
History
Published online: May 24, 2022
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.