Technical Papers
Jul 4, 2022

Construction-Accident Narrative Classification Using Shallow and Deep Learning

Publication: Journal of Construction Engineering and Management
Volume 148, Issue 9

Abstract

It is crucial to extract knowledge from past accidents to prevent future ones. To this end, narrative classification is required in text mining. This autocoding process can be seen as a multiclass classification problem with an imbalanced data set. We evaluated the performance of several state-of-the-art machine learning methods, including 10 shallow learning methods (Rocchio, k-nearest neighbors, linear regression, naive Bayes, decision tree, random forest, gradient boosting, bootstrap aggregating, support vector machine (SVM), and shallow neural network), and five deep learning methods [deep neural network, convolutional neural network (CNN), recurrent neural network with long short-term memory, and a gated recurrent unit, and recurrent CNN]. The input data set contained 4,770 construction accident reports from the Occupational Safety and Health Administration (OSHA). After the narratives were relabeled based on the Occupational Injury and Illness Classification System (OIICS), the accuracy of all shallow classifiers was significantly improved compared with that reported in previous studies. SVM and CNN achieved the highest accuracy of 0.91 and 0.90 among the shallow and deep learning methods, respectively. Misclassifications occur because training data sets lack rich diversity for minority classes, some cases belong to multiple classes, and some divisions have the same key feature words. In the future, when a new data set is available, we can use learned patterns to classify them with high accuracy in practice.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

The code is available upon reasonable request. The data are available at https://github.com/qiao77/Injury-Narratives.

Acknowledgments

This research was supported by the Beijing Key Laboratory of Megaregions Sustainable Development Modeling, Capital University of Economics and Business (CUEB) fund. The authors would like to thank the anonymous reviewers for their detailed and valuable comments, which have significantly improved the quality of the paper.

References

Ayhan, B. U., and O. B. Tokdemir. 2020. “Accident analysis for construction safety using latent class clustering and artificial neural networks.” J. Constr. Eng. Manage. 146 (3): 04019114. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001762.
Baker, H., M. R. Hallowell, and A. J. P. Tixier. 2020. “Automatically learning construction injury precursors from text.” Autom. Constr. 118 (Oct): 1–10. https://doi.org/10.1016/j.autcon.2020.103145.
Bertke, S. J., A. R. Meyers, S. J. Wurzelbacher, J. Bell, M. L. Lampl, and D. Robins. 2012. “Development and evaluation of a naive Bayesian model for coding causation of workers’ compensation claims.” J. Saf. Res. 43 (5–6): 327–332. https://doi.org/10.1016/j.jsr.2012.10.012.
Bertke, S. J., A. R. Meyers, S. J. Wurzelbacher, A. Measure, M. P. Lampl, and D. Robins. 2016. “Comparison of methods for auto-coding causation of injury narratives.” Accid. Anal. Prev. 88 (Mar): 117–123. https://doi.org/10.1016/j.aap.2015.12.006.
Bhattacharjee, P., V. Dey, and U. K. Mandal. 2020. “Risk assessment by failure mode and effects analysis (FMEA) using an interval number based logistic regression model.” Saf. Sci. 132 (Dec): 104967. https://doi.org/10.1016/j.ssci.2020.104967.
Cervantes, J., F. Garcia-Lamont, L. Rodriguez-Mazahua, and A. Lopez. 2020. “A comprehensive survey on support vector machine classification: Applications, challenges and trends.” Neurocomputing 408 (Dec): 189–215. https://doi.org/10.1016/j.neucom.2019.10.118.
Chen, L., K. Vallmuur, and R. Nayak. 2015. “Injury narrative text classification using factorization model.” BMC Med. Inf. Decis. Making 22 (1): 1–12. https://doi.org/10.1186/s12911-021-01695-4.
Cheng, M. Y., D. Kusoemo, and R. A. Gosno. 2020. “Text mining-based construction site accident classification using hybrid supervised machine learning.” Autom. Constr. 118 (Oct): 103265. https://doi.org/10.1016/j.autcon.2020.103265.
Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth. 1996. “From data mining to knowledge discovery in databases.” AI Mag. 17 (3): 37–54. https://doi.org/10.1609/aimag.v17i3.1230.
Gal, Y., and Z. Ghahramani. 2016. “A theoretically grounded application of dropout in recurrent neural networks.” In Vol. 29 of Proc., 30th Conf. on Neural Information Processing Systems (NIPS). La Jolla, CA: Neural Information Processing Systems. https://arxiv.org/abs/1512.05287.
Goh, Y. M., and C. U. Ubeynarayana. 2017. “Construction accident narrative classification: An evaluation of text mining techniques.” Accid. Anal. Prev. 108 (Nov): 122–130. https://doi.org/10.1016/j.aap.2017.08.026.
Guo, B., C. Zhang, J. Liu, and X. Ma. 2019. “Improving text classification with weighted word embeddings via a multi-channel TextCNN model.” Neurocomputing 363 (Oct): 366–374. https://doi.org/10.1016/j.neucom.2019.07.052.
Hartmann, J., J. Huppertz, C. Schamp, and M. Heitmann. 2019. “Comparing automated text classification methods.” Int. J. Res. Mark. 36 (1): 20–38. https://doi.org/10.1016/j.ijresmar.2018.09.009.
Hsu, B. M. 2020. “Comparison of supervised classification models on textual data.” Mathematics 8 (5): 851. https://doi.org/10.3390/math8050851.
ILO (International Labour Organization). 2020. International labour organization, safety, and health at work.” Accessed February 2, 2020. https://www.ilo.org/global/topics/safety-and-health-at-work/.
Jiang, L. X., L. G. Zhang, C. Q. Li, and J. Wu. 2019. “A correlation-based feature weighting filter for naive Bayes.” IEEE Trans. Knowl. Data Eng. 31 (2): 201–213. https://doi.org/10.1109/TKDE.2018.2836440.
Kastrati, Z., A. S. Imran, and S. Y. Yayilgan. 2019. “The impact of deep learning on document classification using semantically rich representations.” Inf. Process. Manage. 56 (5): 1618–1632. https://doi.org/10.1016/j.ipm.2019.05.003.
Khodabandelou, G., W. Kheriji, and F. H. Selem. 2021. “Link traffic speed forecasting using convolutional attention-based gated recurrent unit.” Appl. Intell. 51 (4): 2331–2352. https://doi.org/10.1007/s10489-020-02020-8.
Kim, T., and S. Chi. 2019. “Accident case retrieval and analyses: Using natural language processing in the construction industry.” J. Constr. Eng. Manage. 145 (3): 04019004. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001625.
Kowsari, K., K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown. 2019. “Text classification algorithms: A survey.” Information 10 (4): 150. https://doi.org/10.3390/info10040150.
Kumar, P., S. Batra, and B. Raman. 2021. “Deep neural network hyper-parameter tuning through twofold genetic approach.” Soft Comput. 25 (13): 8747–8771. https://doi.org/10.1007/s00500-021-05770-w.
Lai, S., L. Xu, K. Liu, and J. Zhao. 2015. “Recurrent convolutional neural networks for text classification.” In Proc., 29th AAAI Conf. on Artificial Intelligence, 2267–2273. Palo Alto, CA: Association for the Advancement Artificial Intelligence.
Lehto, M., H. Marucci-Wellman, and H. Corns. 2009. “Bayesian methods: A useful tool for classifying injury narratives into cause groups.” Inj. Prev. 15 (4): 259–265. https://doi.org/10.1136/ip.2008.021337.
Luo, X. Y. 2021. “Efficient English text classification using selected machine learning techniques.” Alex. Eng. J. 60 (3): 3401–3409. https://doi.org/10.1016/j.aej.2021.02.009.
Marucci-Wellman, H. R., H. L. Corns, and M. R. Lehto. 2017. “Classifying injury narratives of large administrative databases for surveillance: A practical approach combining machine learning ensembles and human review.” Accid. Anal. Prev. 98 (Jan): 359–371. https://doi.org/10.1016/j.aap.2016.10.014.
Marucci-Wellman, H. R., M. R. Lehto, and H. L. Corns. 2015. “A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using naive Bayes algorithms.” Accid. Anal. Prev. 84 (Nov): 165–176. https://doi.org/10.1016/j.aap.2015.06.014.
Mistikoglu, G., I. H. Gerek, E. Erdis, P. E. M. Usmen, H. Cakan, and E. E. Kazan. 2015. “Decision tree analysis of construction fall accidents involving roofers.” Expert Syst. Appl. 42 (4): 2256–2263. https://doi.org/10.1016/j.eswa.2014.10.009.
Nanda, G., K. Vallmuur, and M. Lehto. 2018. “Improving autocoding performance of rare categories in injury classification: Is more training data or filtering the solution?” Accid. Anal. Prev. 110 (Jan): 115–127. https://doi.org/10.1016/j.aap.2017.10.020.
Nanda, G., K. Vallmuur, and M. Lehto. 2020. “Intelligent human-machine approaches for assigning groups of injury codes to accident narratives.” Saf. Sci. 125 (May): 104585. https://doi.org/10.1016/j.ssci.2019.104585.
Narang, S. R., M. K. Jindal, and M. Kumar. 2019. “Devanagari ancient character recognition using DCT features with adaptive boosting and bootstrap aggregating.” Soft Comput. 23 (24): 13603–13614. https://doi.org/10.1007/s00500-019-03897-5.
OSHA (Occupational Safety and Health Administration). 2020. “Occupational safety and health administration, construction’s ‘fatal four. ’” Accessed February 2, 2020. https://www.osha.gov/data/commonstats/.
Pennington, J., R. Socher, and C. D. Manning. 2014. “GloVe: Global vectors for word representation.” In Proc., 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.
Qiao, J., and Y. Li. 2018. “Resource leveling using normalized entropy and relative entropy.” Autom. Constr. 87 (Mar): 263–272. https://doi.org/10.1016/j.autcon.2017.12.022.
Quddus, A., A. S. Zandi, L. Prest, and F. J. Comeau. 2021. “Using long short term memory and convolutional neural networks for driver drowsiness detection.” Accid. Anal. Prev. 156 (Jun): 106107. https://doi.org/10.1016/j.aap.2021.106107.
Sarkar, A., J. S. Hickman, A. D. McDonald, W. Y. Huang, T. Vogelpohl, and G. Markkula. 2021. “Steering or braking avoidance response in SHRP2 rear-end crashes and near-crashes: A decision tree approach.” Accid. Anal. Prev. 154 (May): 106055. https://doi.org/10.1016/j.aap.2021.106055.
Schmidhuber, J. 2015. “Deep learning in neural networks: An overview.” Neural Network 61 (Jan): 85–117. https://doi.org/10.1016/j.neunet.2014.09.003.
Shin, D. P., Y. J. Park, J. Seo, and D. E. Lee. 2018. “Association rules mined from construction accident data.” KSCE J. Civ. Eng. 22 (4): 1027–1039. https://doi.org/10.1007/s12205-017-0537-6.
Single, J. I., J. Schmidt, and J. Denecke. 2020. “Knowledge acquisition from chemical accident databases using an ontology-based method and natural language processing.” Saf. Sci. 129 (Sep): 104747. https://doi.org/10.1016/j.ssci.2020.104747.
Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. 2014. “Dropout: A simple way to prevent neural networks from overfitting.” J. Mach. Learn. Res. 15 (1): 1929–1958.
Sun, M. H., and R. D. Yang. 2020. “An efficient secure k nearest neighbor classification protocol with high-dimensional features.” Int. Intell. Syst. 35 (11): 1791–1813. https://doi.org/10.1002/int.22272.
Taylor, J. A., A. V. Lacovara, S. Smith, S. Pandian, and M. Lehto. 2014. “Near-miss narratives from the Fire Service: A Bayesian analysis.” Accid. Anal. Prev. 62 (Jan): 119–129. https://doi.org/10.1016/j.aap.2013.09.012.
Tixier, A. J. P., M. R. Hallowell, B. Rajagopalan, and D. Bowman. 2016. “Application of machine learning to construction injury prediction.” Autom. Constr. 69 (Sep): 102–114. https://doi.org/10.1016/j.autcon.2016.05.016.
US Department of Labor. 2018. “Census of fatal occupational injuries summary.” Accessed February 2, 2020. https://www.bls.gov/news.release/cfoi.nr0.htm.
US Department of Labor. 2020. “Occupational injury, and illness classification system, version 2.01.” Accessed February 2, 2020. https://www.bls.gov/iif/oshoiics.htm/.
Vallmuur, K. 2015. “Machine learning approaches to analysing textual injury surveillance data: A systematic review.” Accid. Anal. Prev. 79 (Jun): 41–49. https://doi.org/10.1016/j.aap.2015.03.018.
Vallmuur, K., H. R. Marucci-Wellman, J. A. Taylor, M. Lehto, H. L. Corns, and G. S. Smith. 2016. “Harnessing information from injury narratives in the ‘big data’ era: Understanding and applying machine learning for injury surveillance.” Inj. Prev. 22 (S1): i34–i42. https://doi.org/10.1136/injuryprev-2015-041813.
Wei, X., L. Zhang, H. Q. Yang, L. Zhang, and Y. P. Yao. 2021. “Machine learning for pore-water pressure time-series prediction: Application of recurrent neural networks.” Geosci. Front. 12 (1): 453–467. https://doi.org/10.1016/j.gsf.2020.04.011.
Wellman, H. M., M. R. Lehto, and G. S. Smith. 2004. “Computerized coding of injury narrative data from the National Health Interview Survey.” Accid. Anal. Prev. 36 (2): 165–171. https://doi.org/10.1016/S0001-4575(02)00146-X.
Zhang, F., H. Fleyeh, X. Wang, and M. Lu. 2019. “Construction site accident analysis using text mining and natural language processing techniques.” Autom. Constr. 99 (Mar): 238–248. https://doi.org/10.1016/j.autcon.2018.12.016.
Zhong, B. T., X. Pan, P. E. D. Love, L. Y. Ding, and W. L. Fang. 2020. “Deep learning and network analysis: Classifying and visualizing accident narratives in construction.” Autom. Constr. 113 (May): 103089. https://doi.org/10.1016/j.autcon.2020.103089.
Zou, Y., A. Kiviniemi, and S. W. Jones. 2017. “Retrieving similar cases for construction project risk management using natural language processing techniques.” Autom. Constr. 80 (Aug): 66–76. https://doi.org/10.1016/j.autcon.2017.04.003.

Information & Authors

Information

Published In

Go to Journal of Construction Engineering and Management
Journal of Construction Engineering and Management
Volume 148Issue 9September 2022

History

Received: Aug 19, 2021
Accepted: Apr 29, 2022
Published online: Jul 4, 2022
Published in print: Sep 1, 2022
Discussion open until: Dec 4, 2022

Permissions

Request permissions for this article.

Authors

Affiliations

Associate Professor, Dept. of Management and Engineering, Capital Univ. of Economics and Business, Fengtai, Beijing 100070, China; Adjunct Professor, Dept. of Beijing Key Laboratory of Megaregions Sustainable Development Modeling, Beijing 100070, China (corresponding author). ORCID: https://orcid.org/0000-0002-4379-0810. Email: [email protected]
Changfeng Wang
Professor, Dept. of Economics and Management, Beijing Univ. of Posts and Telecommunications, Haidian, Beijing 100876, China.
Shuang Guan
Graduate Student, Dept. of Economics and Management, Beijing Univ. of Posts and Telecommunications, Haidian, Beijing 100876, China.
Lv Shuran
Professor, Dept. of Management and Engineering, Capital Univ. of Economics and Business, Fengtai, Beijing 100070, China; Adjunct Professor, Dept. of Beijing Key Laboratory of Megaregions Sustainable Development Modeling, Beijing 100070, China.

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

  • Automatic Identification of Causal Factors from Fall-Related Accident Investigation Reports Using Machine Learning and Ensemble Learning Approaches, Journal of Management in Engineering, 10.1061/JMENEA.MEENG-5485, 40, 1, (2024).
  • Automatically Categorizing Construction Accident Narratives Using the Deep-Learning Model with a Class-Imbalance Treatment Technique, Journal of Construction Engineering and Management, 10.1061/JCEMD4.COENG-14515, 150, 9, (2024).
  • Using Text Mining and Bayesian Network to Identify Key Risk Factors for Safety Accidents in Metro Construction, Journal of Construction Engineering and Management, 10.1061/JCEMD4.COENG-14114, 150, 6, (2024).
  • Dispute Classification and Analysis: Deep Learning–Based Text Mining for Construction Contract Management, Journal of Construction Engineering and Management, 10.1061/JCEMD4.COENG-14080, 150, 1, (2024).
  • Developing a National Data-Driven Construction Safety Management Framework with Interpretable Fatal Accident Prediction, Journal of Construction Engineering and Management, 10.1061/JCEMD4.COENG-12848, 149, 4, (2023).
  • How to prioritize perceived quality attributes from consumers' perspective? Analysis through social media data, Electronic Commerce Research, 10.1007/s10660-022-09652-7, (2023).

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share