Technical Papers
Jun 27, 2024

Automatically Categorizing Construction Accident Narratives Using the Deep-Learning Model with a Class-Imbalance Treatment Technique

Publication: Journal of Construction Engineering and Management
Volume 150, Issue 9

Abstract

Learning from prior incidents is crucial for improving safety, particularly in the construction industry where fatalities and injuries are frequent. High-precision classification of construction accident narratives is a laborious, time-consuming process that requires substantial domain expertise. However, automatic text classification had fallen short of expectations due to a lack of high-quality data sets, inadequate semantic interpretation, and primitive model architecture. To address these issues, this study developed a state-of-the-art text classification (TC) model to extract construction knowledge and classify construction accident narratives into predefined categories. The architecture of the TC deep-learning model was built based on the pretrained instruction-based omnifarious representations (INSTRUCTOR). A class-imbalance treatment (CIT) technique incorporating focal loss and weighted random sampling was embedded to make the model concentrate on hard samples and minority classes. The retrained and fine-tuned INSTRUCTOR-CIT model achieved an F1 score of 82.22% for the benchmark data set containing 1,000 accident narratives from the Occupational Health and Safety Administration (OSHA). Impressively, on a larger benchmark data set of 4,770 OSHA accident narratives labeled by another official system, the model achieved an F1 score of 94.84%, highlighting its generality. Furthermore, the experimental results demonstrated that our model was superior to existing methods with less preprocessing and higher accuracy. Finally, the contribution to construction project management was discussed to enhance unstructured data management in the construction industry. The findings of this study contribute to effective management practices and assist construction professionals focus on value-added tasks such as decision making and corrective action planning.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some or all data, models, or code generated or used during the study are available in a repository online in accordance with funder data retention policies. The code is available at Github (n.d.-a). Data sets 1 and 2 can be downloaded from Github (n.d.-b, c), respectively.

Acknowledgments

This research was supported by the Fundamental Research Funds for the Central Universities (2020JBW007) and the Beijing Humanities and Social Science Development Foundation (20GLC049).

References

Baek, S., W. Jung, and S. H. Han. 2021. “A critical review of text-based research in construction: Data source, analysis method, and implications.” Autom. Constr. 132 (Oct): 103915. https://doi.org/10.1016/j.autcon.2021.103915.
Chen, Z., K. Huang, L. Wu, Z. Zhong, and Z. Jiao. 2022. “Relational graph convolutional network for text-mining-based accident causal classification.” Appl. Sci. 12 (5): 2482. https://doi.org/10.3390/app12052482.
Cheng, M., D. Kusoemo, and R. A. Gosno. 2020. “Text mining-based construction site accident classification using hybrid supervised machine learning.” Autom. Constr. 118 (Oct): 103265. https://doi.org/10.1016/j.autcon.2020.103265.
Chiang, Y., F. K. Wong, and S. Liang. 2018. “Fatal construction accidents in Hong Kong.” J. Constr. Eng. Manage. 144 (3): 04017121. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001433.
Chua, D., and Y. M. Goh. 2004. “Incident causation model for improving feedback of safety knowledge.” J. Constr. Eng. Manage. 130 (4): 542–551. https://doi.org/10.1061/(ASCE)0733-9364(2004)130:4(542).
Devlin, J., M. Chang, K. Lee, and K. Toutanova. 2018. “BERT: Pre-training of deep bidirectional transformers for language understanding.” Preprint, submitted October 11, 2018. http://arxiv.org/abs/1810.04805.
Ding, Y., J. Ma, and X. Luo. 2022. “Applications of natural language processing in construction.” Autom. Constr. 136 (Apr): 104169. https://doi.org/10.1016/j.autcon.2022.104169.
Fang, W., H. Luo, S. Xu, P. E. D. Love, Z. Lu, and C. Ye. 2020. “Automated text classification of near-misses from safety reports: An improved deep learning approach.” Adv. Eng. Inf. 44 (Apr): 101060. https://doi.org/10.1016/j.aei.2020.101060.
Feng, D., and H. Chen. 2021. “A small samples training framework for deep learning-based automatic information extraction: Case study of construction accident news reports analysis.” Adv. Eng. Inf. 47 (Jan): 101256. https://doi.org/10.1016/j.aei.2021.101256.
Gadekar, H., and N. Bugalia. 2023. “Automatic classification of construction safety reports using semi-supervised YAKE-Guided LDA approach.” Adv. Eng. Inf. 56 (Apr): 101929. https://doi.org/10.1016/j.aei.2023.101929.
Gao, L., P. Lu, and Y. Ren. 2021. “A deep learning approach for imbalanced crash data in predicting highway-rail grade crossings accidents.” Reliab. Eng. Syst. Saf. 216 (Dec): 108019. https://doi.org/10.1016/j.ress.2021.108019.
Github. n.d.-a. “INSTRUCTOR-CIT model.” Accessed May 24, 2024. https://github.com/Shuang0421/Construction-accident-classification.
Github. n.d.-b. “OSHA_Con_Acc.” Accessed May 24, 2024. https://github.com/safetyhub/OSHA_Acc.git.
Github. n.d.-c. “Injury-narratives.” Accessed May 24, 2024. https://github.com/qiao77/Injury-Narratives.
Goh, Y. M., and C. U. Ubeynarayana. 2017. “Construction accident narrative classification: An evaluation of text mining techniques.” Accid. Anal. Prev. 108 (Nov): 122–130. https://doi.org/10.1016/j.aap.2017.08.026.
Goldberg, D. M. 2022. “Characterizing accident narratives with word embeddings: Improving accuracy, richness, and generalizability.” J. Saf. Res. 80 (Feb): 441–455. https://doi.org/10.1016/j.jsr.2021.12.024.
Gupta, A. K., C. G. V. S. Pardheev, S. Choudhuri, S. Das, A. Garg, and J. Maiti. 2022. “A novel classification approach based on context connotative network (CCNet): A case of construction site accidents.” Expert Syst. Appl. 202 (Sep): 117281. https://doi.org/10.1016/j.eswa.2022.117281.
He, H., and E. A. Garcia. 2009. “Learning from imbalanced data.” IEEE Trans. Knowl. Data Eng. 21 (9): 1263–1284. https://doi.org/10.1109/TKDE.2008.239.
LeCun, Y., Y. Bengio, and G. Hinton. 2015. “Deep learning.” Nature 521 (7553): 436–444. https://doi.org/10.1038/nature14539.
Li, X., C. Lv, W. Wang, G. Li, L. Yang, and J. Yang. 2023. “Generalized focal loss: Towards efficient representation learning for dense object detection.” IEEE Trans. Pattern Anal. Mach. Intell. 45 (3): 3139–3153. https://doi.org/10.1109/TPAMI.2022.3180392.
Lin, T., P. Goyal, R. Girshick, K. He, and P. Dollar. 2020. “Focal loss for dense object detection.” IEEE Trans. Pattern Anal. Mach. Intell. 42 (2): 318–327. https://doi.org/10.1109/TPAMI.2018.2858826.
Liu, J., H. Luo, and H. Liu. 2022. “Deep learning-based data analytics for safety in construction.” Autom. Constr. 140 (Aug): 104302. https://doi.org/10.1016/j.autcon.2022.104302.
Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoye, and V. Stoyanov. 2019. “RoBERTa: A robustly optimized BERT pretraining approach.” Preprint, submitted July 26, 2019. http://arxiv.org/abs/1907.11692.
Nanda, G., K. Vallmuur, and M. Lehto. 2018. “Improving autocoding performance of rare categories in injury classification: Is more training data or filtering the solution?” Accid. Anal. Prev. 110 (Jan): 115–127. https://doi.org/10.1016/j.aap.2017.10.020.
Ni, J., G. H. Abrego, N. Constant, J. Ma, K. B. Hall, D. M. Cer, and Y. Yang. 2021a. “Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models.” Preprint, submitted August 19, 2021. http://arxiv.org/abs/2108.08877.
Ni, J., C. Qu, J. Lu, Z. Dai, G. H. Abrego, J. Ma, V. Zhao, Y. Luan, K. B. Hall, M. Chang, and Y. Yang. 2021b. “Large dual encoders are generalizable retrievers.” Preprint, submitted December 15, 2021. http://arxiv.org/abs/2112.07899.
Pan, X., B. Zhong, Y. Wang, and L. Shen. 2022. “Identification of accident-injury type and bodypart factors from construction accident reports: A graph-based deep learning framework.” Adv. Eng. Inf. 54 (Oct): 101752. https://doi.org/10.1016/j.aei.2022.101752.
Qiao, J., C. Wang, S. Guan, and L. Shuran. 2022. “Construction-accident narrative classification using shallow and deep learning.” J. Constr. Eng. Manage. 148 (9): 04022088. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002354.
Raffel, C., N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. 2020. “Exploring the limits of transfer learning with a unified text-to-text transformer.” J. Mach. Learn. Res. 21 (140): 1–67.
Salovaara, A., B. R. Upreti, J. I. Nykanen, and J. Merikivi. 2020. “Building on shaky foundations? Lack of falsification and knowledge contestation in IS theories, methods, and practices.” Eur. J. Inf. Syst. 29 (1): 65–83. https://doi.org/10.1080/0960085X.2019.1685737.
Su, H., W. Shi, J. Kasai, Y. Wang, Y. Hu, M. Ostendorf, W. Yih, N. Smith, L. Zettlemoyer, and T. Yu. 2023. “One embedder, any task: Instruction-finetuned text embeddings.” Preprint, submitted December 19, 2022. http://arxiv.org/abs/2212.09741.
Sunindijo, R. Y., and P. X. W. Zou. 2012. “Political skill for developing construction safety climate.” J. Constr. Eng. Manage. 138 (5): 605–612. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000482.
Tanguy, L., N. Tulechki, A. Urieli, E. Hermann, and C. Raynal. 2016. “Natural language processing for aviation safety reports: From classification to interactive analysis.” Comput. Ind. 78 (May): 80–95. https://doi.org/10.1016/j.compind.2015.09.005.
Tian, D., M. Li, S. Han, and Y. Shen. 2022. “A novel and intelligent safety-hazard classification method with syntactic and semantic features for large-scale construction projects.” J. Constr. Eng. Manage. 148 (10): 04022109. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002382.
Tixier, A., M. Hallowell, B. Rajagopalan, and D. Bowman. 2016. “Application of machine learning to construction injury prediction.” Autom. Constr. 69 (Sep): 102–114. https://doi.org/10.1016/j.autcon.2016.05.016.
Ul Hassan, F., T. Le, and X. Lv. 2021. “Addressing legal and contractual matters in construction using natural language processing: A critical review.” J. Constr. Eng. Manage. 147 (9): 03121004. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002122.
Wang, H., and F. Miao. 2022. “Building extraction from remote sensing images using deep residual U-Net.” Eur. J. Remote Sens. 55 (1): 71–85. https://doi.org/10.1080/22797254.2021.2018944.
Woods, D., E. Patterson, and E. Roth. 2002. “Can we ever escape from data overload? A cognitive systems diagnosis.” Cognition Technol. Work 4 (1): 22–36. https://doi.org/10.1007/s101110200002.
Wu, C., X. Li, Y. Guo, J. Wang, Z. Ren, M. Wang, and Z. Yang. 2022. “Natural language processing for smart construction: Current status and future directions.” Autom. Constr. 134 (Feb): 104059. https://doi.org/10.1016/j.autcon.2021.104059.
Yeung, M., E. Sala, C. Schoenlieb, and L. Rundo. 2022. “Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation.” Comput. Med. Imaging Graphics 95 (Jan): 10206. https://doi.org/10.1016/j.compmedimag.2021.102026.
Zhang, F., H. Fleyeh, X. Wang, and M. Lu. 2019. “Construction site accident analysis using text mining and natural language processing techniques.” Autom. Constr. 99 (Mar): 238–248. https://doi.org/10.1016/j.autcon.2018.12.016.
Zhang, J., L. Zi, Y. Hou, D. Deng, W. Jiang, and M. Wang. 2020. “A C-BiLSTM approach to classify construction accident reports.” Appl. Sci. 10 (17): 5754. https://doi.org/10.3390/app10175754.
Zhong, B., X. Pan, P. E. D. Love, L. Ding, and W. Fang. 2020. “Deep learning and network analysis: Classifying and visualizing accident narratives in construction.” Autom. Constr. 113 (May): 103089. https://doi.org/10.1016/j.autcon.2020.103089.
Zhou, Z., Y. M. Goh, and Q. Li. 2015. “Overview and analysis of safety management studies in the construction industry.” Saf. Sci. 72 (Feb): 337–350. https://doi.org/10.1016/j.ssci.2014.10.006.
Zhou, Z., L. Wei, J. Yuan, J. Cui, Z. Zhang, W. Zhuo, and D. Lin. 2023. “Construction safety management in the data-rich era: A hybrid review based upon three perspectives of nature of dataset, machine learning approach, and research topic.” Adv. Eng. Inf. 58 (Oct): 102144. https://doi.org/10.1016/j.aei.2023.102144.

Information & Authors

Information

Published In

Go to Journal of Construction Engineering and Management
Journal of Construction Engineering and Management
Volume 150Issue 9September 2024

History

Received: Sep 13, 2023
Accepted: Jan 23, 2024
Published online: Jun 27, 2024
Published in print: Sep 1, 2024
Discussion open until: Nov 27, 2024

Permissions

Request permissions for this article.

Authors

Affiliations

Qing Shuang [email protected]
Associate Professor, School of Economics and Management, Beijing Jiaotong Univ., Beijing 100044, China. Email: [email protected]
Master’s Candidate, School of Economics and Management, Beijing Jiaotong Univ., Beijing 100044, China. ORCID: https://orcid.org/0009-0001-8830-8995. Email: [email protected]
Associate Professor, School of Economics and Management, Beijing Jiaotong Univ., Beijing 100044, China (corresponding author). ORCID: https://orcid.org/0000-0001-6705-6082. Email: [email protected]
Master’s Candidate, School of Economics and Management, Beijing Jiaotong Univ., Beijing 100044, China. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share