Technical Papers
Jun 30, 2023

Deep Learning–Based Named Entity Recognition and Resolution of Referential Ambiguities for Enhanced Information Extraction from Construction Safety Regulations

Publication: Journal of Computing in Civil Engineering
Volume 37, Issue 5

Abstract

Construction safety regulations and standards contain a massive number of fall protection requirements with respect to different equipment, facilities, and operations. Automated field compliance checking aims to detect field violations of construction safety regulations for improved compliance and safety. Recent research efforts focused on automated tracking of labor and equipment toward improved violation detection and safety compliance. However, extracting and modeling safety requirements for supporting automated violation detection or safety alert systems remains highly manual. Toward addressing this gap, information extraction provides an opportunity to automatically extract requirements from construction safety regulations for comparisons with field information to detect violations (or predict and prevent violations before they occur). However, existing information extraction methods are limited in terms of their scalability and/or accuracy. To address this need, this paper proposes a deep learning–based information extraction method for automatically extracting named entities describing fall protection requirements [e.g., scaffold, horizontal direction, or 1.82 m (6 ft)] from construction safety regulations and resolving referential ambiguities. The proposed information extraction method consists of three main submethods: (1) a deep learning–based method to recognize entities from the regulations, (2) a deep learning–based method to recognize referential ambiguities in the extracted entities, and (3) a named entity normalization method to resolve these ambiguities. The proposed method was implemented and tested on 20 selected Occupational Safety and Health Administration (OSHA) sections related to fall protection. An overall information extraction precision, recall, and F-1 measure of 93.2%, 89.6%, and 91.1% were obtained, which indicates good information extraction performance.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some data, models, or code generated or used during the study (the labeled gold standard for evaluation) are available from the corresponding author by request.

Acknowledgments

The authors would like to thank the National Science Foundation (NSF). This paper is based on work supported by NSF under Grant No. 1827733. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF.

References

Agrawal, M., S. Hegselmann, H. Lang, Y. Kim, and D. Sontag. 2022. “Large language models are zero-shot clinical information extractors.” Preprint, submitted May 25, 2022. http://arxiv.org/abs/2205.12689.
Artstein, R. 2017. “Inter-annotator agreement.” In Handbook of linguistic annotation, 297–313. Dordrecht, Netherlands: Springer.
Bagga, A., and B. Baldwin. 1998. “Algorithms for scoring coreference chains.” In Proc., 1st Language Resources Evaluation, 563–566. Paris: European Language Resources Association.
Bikel, D. M., S. Miller, R. Schwartz, and R. Weischedel. 1998. “Nymble: A high-performance learning name-finder.” Preprint, submitted March 27, 1998. http://arxiv.org/abs/9803003.
Chen, H., X. Luo, Z. Zheng, and J. Ke. 2019. “A proactive workers’ safety risk evaluation framework based on position and posture data fusion.” Autom. Constr. 98 (Feb): 275–288. https://doi.org/10.1016/j.autcon.2018.11.026.
Chiu, J. P., and E. Nichols. 2016. “Named entity recognition with bidirectional LSTM-CNNs.” Trans. Assoc. Comput. Ling. 4 (Jul): 357–370. https://doi.org/10.1162/tacl_a_00104.
Cho, H., W. Choi, and H. Lee. 2017. “A method for named entity normalization in biomedical articles: Application to diseases and plants.” BMC Bioinf. 18 (1): 1–12. https://doi.org/10.1186/s12859-017-1857-8.
Dargan, S., M. Kumar, M. R. Ayyagari, and G. Kumar. 2020. “A survey of deep learning and its applications: A new paradigm to machine learning.” Arch. Comput. Methods Eng. 27 (4): 1071–1092. https://doi.org/10.1007/s11831-019-09344-w.
Dong, X. S., J. A. Largay, S. D. Choi, X. Wang, C. T. Cain, and N. Romano. 2017. “Fatal falls and PFAS use in the construction industry: Findings from the NIOSH FACE reports.” Accid. Anal. Prev. 102 (May): 136–143. https://doi.org/10.1016/j.aap.2017.02.028.
Fakhraei, S., J. Mathew, and J. L. Ambite. 2019. “Nseen: Neural semantic embedding for entity normalization.” In ECML PKDD, 665–680. Cham, Switzerland: Springer.
Fang, L., Y. Cao, and Z. Zheng. 2021. “Biomedical entity normalization based on pre-trained model with enhanced information.” In Proc., 20th Int. Semantic Web Conf. Cham, Switzerland: Springer.
Fang, W., L. Ding, H. Luo, and P. E. Love. 2018. “Falls from heights: A computer vision-based approach for safety harness detection.” Autom. Constr. 91 (Jul): 53–61. https://doi.org/10.1016/j.autcon.2018.02.018.
Fang, W., L. Ma, P. E. Love, H. Luo, L. Ding, and A. Zhou. 2020. “Knowledge graph for identifying hazards on construction sites: Integrating computer vision with ontology.” Autom. Constr. 119 (Nov): 103310. https://doi.org/10.1016/j.autcon.2020.103310.
Farouk, M. 2020. “Measuring text similarity based on structure and word embedding.” Cognit. Syst. Res. 63 (Oct): 1–10. https://doi.org/10.1016/j.cogsys.2020.04.002.
Isozaki, H., and H. Kazawa. 2002. “Efficient support vector classifiers for named entity recognition.” In Proc., 19th Int. Conf. Computational Linguistics, 1–7. Stroudsburg, PA: Association for Computational Linguistics. https://doi.org/10.5555/1072228.
Janiesch, C., P. Zschech, and K. Heinrich. 2021. “Machine learning and deep learning.” Electron. Markets 31 (3): 685–695. https://doi.org/10.1007/s12525-021-00475-2.
Jijkoun, V., M. A. Khalid, M. Marx, and M. De Rijke. 2008. “Named entity normalization in user generated content.” In Proc., 2nd Workshop on Analytics for Noisy Unstructured Text Data, 23–30. New York: Association for Computing Machinery. https://doi.org/10.1145/1390749.
Kim, T., and S. Chi. 2019. “Accident case retrieval and analyses: Using natural language processing in the construction industry.” J. Constr. Eng. Manage. 145 (3): 04019004. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001625.
Labor. 2021a. “Census of fatal occupational injuries (CFOI)—Current.” Accessed January 20, 2022. https://www.bls.gov/iif/oshcfoi1.htm.
Labor. 2021b. “Employer-reported workplace injuries and illnesses-2020.” Accessed January 20, 2022. https://www.bls.gov/news.release/pdf/osh.pdf.
Labor. 2022. “Labor force statistics from the current population survey.” Accessed January 20, 2022. https://www.bls.gov/cps/cpsaat47.htm.
Lafferty, J., A. McCallum, and F. C. Pereira. 2001. “Conditional random fields: Probabilistic models for segmenting and labeling sequence data.” In Proc., 18th Int. Conf. Machine Learning, 282–289. San Francisco: Morgan Kaufmann Publishers. https://doi.org/10.5555/645530.
Leaman, R., C. H. Wei, and Z. Lu. 2015. “tmChem: A High performance approach for chemical named entity recognition and normalization.” J. Cheminf. 7 (1): 1–10. https://doi.org/10.1186/1758-2946-7-S1-S3.
Lee, K., L. He, M. Lewis, and L. Zettlemoyer. 2017. “End-to-end neural coreference resolution.” Preprint, submitted July 27, 2017. http://arxiv.org/abs/1707.07045.
Liberty Mutual. 2020. “Workplace safety index 2020: Construction.” Accessed January 20, 2022. https://business.libertymutual.com/wp-content/uploads/2021/04/WSI_1002.pdf.
Liu, K., and N. El-Gohary. 2017. “Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports.” Autom. Constr. 81 (Sep): 313–327. https://doi.org/10.1016/j.autcon.2017.02.003.
Liu, K., and N. El-Gohary. 2018. “Unsupervised named entity normalization for supporting information fusion for big bridge data analytics.” In Proc., European Group for Intelligent Computing in Engineering, 130–149. Cham, Switzerland: Springer.
Lu, Y., Q. Li, Z. Zhou, and Y. Deng. 2015. “Ontology-based knowledge modeling for automated construction safety checking.” Saf. Sci. 79 (Nov): 11–18. https://doi.org/10.1016/j.ssci.2015.05.008.
Nadeau, D., P. D. Turney, and S. Matwin. 2006. “Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity.” In Proc., Conf. of the Canadian Society for Computational Studies of Intelligence, 266–277. Berlin: Springer.
Nepal, M. P., S. Staub-French, R. Pottinger, and J. Zhang. 2013. “Ontology-based feature modeling for construction information extraction from a building information model.” J. Comput. Civ. Eng. 27 (5): 555–569. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000230.
Névéol, A., C. Grouin, X. Tannier, T. Hamon, L. Kelly, L. Goeuriot, and P. Zweigenbaum. 2015. “CLEF eHealth Evaluation Lab 2015 Task 1b: Clinical named entity recognition.” In Proc., 6th Conf. and Labs of the Evaluation Forum (CLEF Working Notes). Cham, Switzerland: Springer.
OSHA (Occupational Safety and Health Administration). 2020a. “Commonly used statistics.” Accessed January 20, 2022. https://www.osha.gov/data/commonstats.
OSHA (Occupational Safety and Health Administration). 2020b. Construction industry: OSHA safety and health standards (29 CFR 1926/1910). Washington, DC: OSHA.
OSHA (Occupational Safety and Health Administration). 2020c. “Fall prevention: General statistics related to slips, trips, & falls.” Accessed January 20, 2022. https://www.oshatraining.com/fall-protection-and-prevention-training.php.
Pennington, J., R. Socher, and C. Manning. 2014. “Glove: Global vectors for word representation.” In Proc., Empirical Methods in Natural Language Processing (EMNLP) Conf., 1532–1543. Stroudsburg, PA: Association for Computational Linguistics.
Peters, M. E., M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. 2018. “Deep contextualized word representations.” Preprint, submitted February 15, 2018. http://arxiv.org/abs/1802.05365.
Pradhan, S., A. Moschitti, N. Xue, O. Uryupina, and Y. Zhang. 2012. “CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes.” In Proc., Joint Conf. on EMNLP and CoNLL-Shared Task, 1–40. Stroudsburg, PA: Association for Computational Linguistics.
Rahman, A., and V. Ng. 2009. “Supervised models for coreference resolution.” In Proc., Empirical Methods in Natural Language Processing (EMNLP) Conf., 968–977. Stroudsburg, PA: Association for Computational Linguistics.
Ren, R., and J. Zhang. 2021. “Semantic rule-based construction procedural information extraction to guide jobsite sensing and monitoring.” J. Comput. Civ. Eng. 35 (6): 04021026. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000971.
Roy, D., D. Ganguly, S. Bhatia, S. Bedathur, and M. Mitra. 2018. “Using word embeddings for information retrieval: How collection and term normalization choices affect performance.” In Proc., 27th ACM Int. Conf. on Information and Knowledge Management, 1835–1838. New York: Association for Computing Machinery.
Seo, J., S. Han, S. Lee, and H. Kim. 2015. “Computer vision techniques for construction safety and health monitoring.” Adv. Eng. Inf. 29 (2): 239–251. https://doi.org/10.1016/j.aei.2015.02.001.
Sitikhu, P., K. Pahi, P. Thapa, and S. Shakya. 2019. “A comparison of semantic similarity methods for maximum human interpretability.” In Vol. 1 of Proc., Artificial Intelligence for Transforming Business and Society (AITB), 1–4. New York: IEEE.
Tadesse, S., T. Kelaye, and Y. Assefa. 2016. “Utilization of personal protective equipment and associated factors among textile factory workers at Hawassa Town, southern Ethiopia.” J. Occup. Med. Toxicol. 11 (1): 6. https://doi.org/10.1186/s12995-016-0096-7.
Tang, S., D. Roberts, and M. Golparvar-Fard. 2020. “Human-object interaction recognition for automatic construction site safety inspection.” Autom. Constr. 120 (Dec): 103356. https://doi.org/10.1016/j.autcon.2020.103356.
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. “Attention is all you need.” In Proc., 31st Annual Conf. Neural Information Processing Systems (NIPS), 5998–6008. Red Hook, NY: Curran Associates.
Yadav, V., and S. Bethard. 2019. “A survey on recent advances in named entity recognition from deep learning models.” Preprint, submitted October 25, 2019. http://arxiv.org/abs/1910.11470.
Yenkar, P., and S. D. Sawarkar. 2021. “Gazetteer based unsupervised learning approach for location extraction from complaint tweets.” In Vol. 1049 of Proc., IOP Conf. Series: Materials Science and Engineering, 012009. Bristol, UK: IOP Publishing.
Yuan, Z., Z. Zhao, H. Sun, J. Li, F. Wang, and S. Yu. 2022. “CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.” J. Biomed. Inf. 126 (Feb): 103983. https://doi.org/10.1016/j.jbi.2021.103983.
Zhang, F., H. Fleyeh, X. Wang, and M. Lu. 2019. “Construction site accident analysis using text mining and natural language processing techniques.” Autom. Constr. 99 (Mar): 238–248. https://doi.org/10.1016/j.autcon.2018.12.016.
Zhang, J., and N. M. El-Gohary. 2013. “Semantic NLP-based information extraction from construction regulatory documents for automated compliance checking.” J. Comput. Civ. Eng. 30 (2): 04015014. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000346.
Zhang, R., and N. El-Gohary. 2021. “A deep neural network-based method for deep information extraction using transfer learning strategies to support automated compliance checking.” Autom. Constr. 132 (Dec): 103834. https://doi.org/10.1016/j.autcon.2021.103834.
Zhang, R., and N. El-Gohary. 2022. “Hierarchical representation and deep learning-based method for automatically transforming textual building codes into semantic computable requirements.” J. Comput. Civ. Eng. 36 (5): 04022022. https://doi.org/10.1061/(ASCE)CP.1943-5487.0001014.
Zhang, S., F. Boukamp, and J. Teizer. 2014. “Ontology-based semantic modeling of safety management knowledge.” In Proc., Int. Conf. on Computing in Civil and Building Engineering, 2254–2262. Red Hook, NY: Curran Associates.
Zhang, S., F. Boukamp, and J. Teizer. 2015. “Ontology-based semantic modeling of construction safety knowledge: Towards automated safety planning for job hazard analysis (JHA).” Autom. Constr. 52 (Apr): 29–41. https://doi.org/10.1016/j.autcon.2015.02.005.
Zhong, B., H. Li, H. Luo, J. Zhou, W. Fang, and X. Xing. 2020a. “Ontology-based semantic modeling of knowledge in construction: Classification and identification of hazards implied in images.” J. Constr. Eng. Manage. 146 (4): 04020013. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001767.
Zhong, B., X. Pan, P. E. Love, L. Ding, and W. Fang. 2020b. “Deep learning and network analysis: Classifying and visualizing accident narratives in construction.” Autom. Constr. 113 (May): 103089. https://doi.org/10.1016/j.autcon.2020.103089.
Zhong, B., H. Wu, R. Xiang, and J. Guo. 2022. “Automatic information extraction from construction quality inspection regulations: A knowledge pattern–based ontological method.” J. Constr. Eng. Manage. 148 (3): 04021207. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002240.
Zhou, H., S. Ning, Z. Liu, C. Lang, Z. Liu, and B. Lei. 2020. “Knowledge-enhanced biomedical named entity recognition and normalization: Application to proteins and genes.” BMC Bioinform. 21 (1): 1–15. https://doi.org/10.1186/s12859-020-3375-3.
Zhou, P., and N. El-Gohary. 2017. “Ontology-based automated information extraction from building energy conservation codes.” Autom. Constr. 74 (Feb): 103–117. https://doi.org/10.1016/j.autcon.2016.09.004.
Zhou, W., C. Yu, N. Smalheiser, V. Torvik, and J. Hong. 2007. “Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature.” In Proc., 30th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 655–662. New York: Association for Computing Machinery. https://doi.org/10.1145/1277741.1277853.

Information & Authors

Information

Published In

Go to Journal of Computing in Civil Engineering
Journal of Computing in Civil Engineering
Volume 37Issue 5September 2023

History

Received: Mar 2, 2022
Accepted: Aug 18, 2022
Published online: Jun 30, 2023
Published in print: Sep 1, 2023
Discussion open until: Nov 30, 2023

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Xiyu Wang, S.M.ASCE [email protected]
Graduate Student, Dept. of Civil and Environmental Engineering, Univ. of Illinois at Urbana-Champaign, 205 N. Mathews Ave., Urbana, IL 61801. Email: [email protected]
Nora El-Gohary, A.M.ASCE [email protected]
Associate Professor, Dept. of Civil and Environmental Engineering, Univ. of Illinois at Urbana-Champaign, 205 N. Mathews Ave., Urbana, IL 61801 (corresponding author). Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

  • Information Integration of Regulation Texts and Tables for Automated Construction Safety Knowledge Mapping, Journal of Construction Engineering and Management, 10.1061/JCEMD4.COENG-14436, 150, 5, (2024).

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share