Technical Papers
Nov 23, 2022

Extracting Worker Unsafe Behaviors from Construction Images Using Image Captioning with Deep Learning–Based Attention Mechanism

Publication: Journal of Construction Engineering and Management
Volume 149, Issue 2

Abstract

Safety in the construction industry has always been a focus of attention. Existing methods of detecting unsafe behavior of workers relied primarily on manual detection. Not only did it consume significant time and money, but it also inevitably produced omissions. Currently, automated techniques for detecting unsafe behaviors rely only on the unsafe factors of workers’ ontology to judge their behaviors, making it difficult to understand unsafe behaviors in complex scenes. To address the presented problems, this study proposed a method to automatically extract workers’ unsafe behaviors by combining information from complex scenes—an image captioning based on an attention mechanism. First, three different sets of image captioning models were constructed using convolutional neural network (CNN), which are widely used in AI. These models could extract key information from complex scenes was constructed. Then, two datasets dedicated to the construction domain were created for method validation. Finally, three sets of experiments were conducted by combining the datasets and the three different sets of models. The results showed that the method could detect the worker’s job type and output the interaction behavior between the worker and the target (unsafe behavior) based on the environmental information in the construction images. We introduced environmental information into the determination of workers’ unsafe behaviors for the first time and not only output the worker’s job type but also determine the worker’s behavior. This allows the model output to be better for ergonomic analysis.

Practical Applications

This study developed an intelligent solution for determining whether a worker had unsafe behavior in complex scenarios using behavioral norms. The operator would not need to prepare the appropriate construction safety knowledge, such as whether to wear a helmet, whether to wear a safety belt, or whether to work at height, but simply input the target image into the model, and the model would combine the predefined behavioral norms, scene information, and other factors to determine what kind of behavior (or unsafe behavior) was contained in the image and output a simple description of the information. Descriptions could also be set as fixed templates for easy management, such as worker A wearing (not) a helmet, and these descriptions would play a key role in daily management and project summaries. Using this method, managers could use the relevant equipment to automate the acquisition of possible good behaviors or violations of anyone on site. It also enables efficient organization and recording, improving the efficiency of managers.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some or all data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request. (The dataset and model code are available from the first author upon request.)

Acknowledgments

This study was supported by the key R&D program of Shandong Province. The authors are very grateful to all laboratory staff for their help and to Hongyu Chang for his online guidance.

References

Ali, R., J. H. Chuah, M. S. Abu Talip, N. Mokhtar, and M. A. Shoaib. 2022. “Structural crack detection using deep convolutional neural networks.” Autom. Constr. 133 (Jan): 103989. https://doi.org/10.1016/j.autcon.2021.103989.
An, X. H., L. Zhou, Z. G. Liu, C. Z. Wang, P. F. Li, and Z. W. Li. 2021. “Dataset and benchmark for detecting moving objects in construction sites.” Autom. Constr. 122 (Feb): 103482. https://doi.org/10.1016/j.autcon.2020.103482.
Anderson, P., X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. 2017. “Bottom-up and top-down attention for image captioning and visual question answering.” In Proc., IEEE Conf. on Computer Vision and Pattern Recognition, 6077–6086. New York: IEEE.
Assadzadeh, A., M. Arashpour, I. Brilakis, T. Ngo, and E. Konstantinou. 2022. “Vision-based excavator pose estimation using synthetically generated datasets with domain randomization.” Autom. Constr. 134 (Feb): 104089. https://doi.org/10.1016/j.autcon.2021.104089.
Ayhan, B. U., and O. B. Tokdemir. 2019. “Predicting the outcome of construction incidents.” Saf. Sci. 113 (11): 91–104. https://doi.org/10.1016/j.ssci.2018.11.001.
Cheng, J. C. P., and M. Z. Wang. 2018. “Automated detection of sewer pipe defects in closed-circuit television images using deep learning techniques.” Autom. Constr. 95 (Nov): 155–171. https://doi.org/10.1016/j.autcon.2018.08.006.
Czerniawski, T., and F. Leite. 2020. “Automated digital modeling of existing buildings: A review of visual object recognition methods.” Autom. Constr. 113 (Jun): 103131. https://doi.org/10.1016/j.autcon.2020.103131.
Ding, L. Y., W. L. Fang, H. B. Luo, P. E. D. Love, B. T. Zhong, and X. Ouyang. 2018. “A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long short-term memory.” Autom. Constr. 86 (Jun): 118–124. https://doi.org/10.1016/j.autcon.2017.11.002.
Dutta, S., A. Shi, R. Choudhary, Z. Zhang, A. Jain, and S. Misailovic. 2020. “Detecting flaky tests in probabilistic and machine learning applications.” In Proc., 29th ACM SIGSOFT Int. Symp. on Software Testing and Analysis, 211–224. New York: Association for Computing Machinery. https://doi.org/10.1145/3395363.3397366.
Fang, Q., H. Li, X. C. Luo, L. Y. Ding, H. B. Luo, and C. Q. Li. 2018a. “Computer vision aided inspection on falling prevention measures for steeplejacks in an aerial environment.” Autom. Constr. 93 (Sep): 148–164. https://doi.org/10.1016/j.autcon.2018.05.022.
Fang, Q., H. Li, X. C. Luo, L. Y. Ding, H. B. Luo, T. M. Rose, and W. P. An. 2018b. “Detecting non-hardhat-use by a deep learning method from far-field surveillance videos.” Autom. Constr. 85 (Jan): 1–9. https://doi.org/10.1016/j.autcon.2017.09.018.
Gong, J., C. H. Caldas, and C. Gordon. 2011. “Learning and classifying actions of construction workers and equipment using Bag-of-Video-Feature-Words and Bayesian network models.” Adv. Eng. Inf. 25 (4): 771–782. https://doi.org/10.1016/j.aei.2011.06.002.
Han, S., S. Lee, and F. Pena-Mora. 2013. “Vision-Based detection of unsafe actions of a construction worker: Case study of ladder climbing.” J. Comput. Civ. Eng. 27 (6): 635–644. https://doi.org/10.1061/(ASCE)Cp.1943-5487.0000279.
He, K., X. Zhang, S. Ren, and J. Sun. 2016. “Deep residual learning for image recognition.” In Proc., 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). New York: IEEE.
Hochreiter, S., and J. Schmidhuber. 1997. “Long short-term memory.” Neural Comput. 9 (8): 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Hou, X., Y. Zeng, and J. Xue. 2020. “Detecting structural components of building engineering based on deep-learning method.” J. Constr. Eng. Manage. 146 (2): 04019097. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001751.
Huang, G., Z. Liu, V. Laurens, and K. Q. Weinberger. 2016. “Densely connected convolutional networks.” In Proc., IEEE Conf. on Computer Vision and Pattern Recognition, 4700–4708. New York: IEEE.
Khan, M., R. Khalid, S. Anjum, N. Khan, S. Cho, and C. Park. 2022. “Tag and IoT based safety hook monitoring for prevention of falls from height.” Autom. Constr. 136 (Apr): 104153. https://doi.org/10.1016/j.autcon.2022.104153.
Kim, H., K. Kim, and H. Kim. 2016. “Data-driven scene parsing method for recognizing construction site objects in the whole image.” Autom. Constr. 71 (2): 271–282. https://doi.org/10.1016/j.autcon.2016.08.018.
Kong, T., W. L. Fang, P. E. D. Love, H. B. Luo, S. J. Xu, and H. Li. 2021. “Computer vision and long short-term memory: Learning to predict unsafe behaviour in construction.” Adv. Eng. Inf. 50 (Apr): 101400. https://doi.org/10.1016/j.aei.2021.101400.
Konstantinou, E., and I. Brilakis. 2018. “Matching construction workers across views for automated 3D vision tracking on-site.” J. Constr. Eng. Manage. 144 (7): 04018061. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001508.
Lee, K., and S. Han. 2021. “Convolutional neural network modeling strategy for fall-related motion recognition using acceleration features of a scaffolding structure.” Autom. Constr. 130 (Jun): 103857. https://doi.org/10.1016/j.autcon.2021.103857.
Li, T. S., M. Alipour, and D. K. Harris. 2021. “Mapping textual descriptions to condition ratings to assist bridge inspection and condition assessment using hierarchical attention.” Autom. Constr. 129 (Apr): 103801. https://doi.org/10.1016/j.autcon.2021.103801.
Lin, T.-Y., M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. 2014. Microsoft COCO: Common objects in context. Berlin: Springer.
Liu, H., G. B. Wang, T. Huang, P. He, M. Skitmore, and X. C. Luo. 2020. “Manifesting construction activity scenes via image captioning.” Autom. Constr. 119 (5): 103334. https://doi.org/10.1016/j.autcon.2020.103334.
Lu, J., C. Xiong, D. Parikh, and R. Socher. 2017. “Knowing when to look: Adaptive attention via A visual sentinel for image captioning.” In Proc., 30th IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2017), 3242–3250. New York: IEEE.
Lu, J., J. Yang, D. Batra, and D. Parikh. 2018. “Neural baby talk.” In Proc., CVPR IEEE, 7219–7228. New York: IEEE.
Luo, H., M. Wang, P. K.-Y. Wong, and J. C. P. Cheng. 2020. “Full body pose estimation of construction equipment using computer vision and deep learning techniques.” Autom. Constr. 110 (Feb): 103016. https://doi.org/10.1016/j.autcon.2019.103016.
Luo, H., M. Wang, P. K.-Y. Wong, J. Tang, and J. C. P. Cheng. 2021. “Construction machine pose prediction considering historical motions and activity attributes using gated recurrent unit (GRU).” Autom. Constr. 121 (Jan): 103444. https://doi.org/10.1016/j.autcon.2020.103444.
Luo, X., H. Li, D. Cao, F. Dai, J. Seo, and S. Lee. 2018a. “Recognizing diverse construction activities in site images via relevance networks of construction-related objects detected by convolutional neural networks.” J. Comput. Civ. Eng. 32 (3): 04018012. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000756.
Luo, X. C., H. Li, D. P. Cao, Y. T. Yu, X. C. Yang, and T. Huang. 2018b. “Towards efficient and objective work sampling: Recognizing workers’ activities in site surveillance videos with two-stream convolutional networks.” Autom. Constr. 94 (Oct): 360–370. https://doi.org/10.1016/j.autcon.2018.07.011.
Ministry of Housing and Urban-Rural Development. 2022. General specifications for construction scaffolding. Beijing: Ministry of Housing and Urban-Rural Development.
Mohajeri, M., A. Ardeshir, H. Malekitabar, and S. Rowlinson. 2021. “Structural model of internal factors influencing the safety behavior of construction workers.” J. Constr. Eng. Manage. 147 (11): 04021156. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002182.
Nath, N. D., A. H. Behzadan, and S. G. Paal. 2020. “Deep learning for site safety: Real-time detection of personal protective equipment.” Autom. Constr, 112 (12): 103085. https://doi.org/10.1016/j.autcon.2020.103085.
Paneru, S., and I. Jeelani. 2021. “Computer vision applications in construction: Current state, opportunities & challenges.” Autom. Constr. 132 (4): 103940. https://doi.org/10.1016/j.autcon.2021.103940.
Papineni, S. 2002. “Blue; A method for automatic evaluation of machine translation.” In Proc., ACL 2002, 311–318. Stroudsburg, PA: Association for Computational Linguistics.
Pi, Y. L., N. D. Nath, and A. H. Behzadan. 2021. “Detection and semantic segmentation of disaster damage in UAV footage.” J. Comput. Civ. Eng. 35 (2): 04020063. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000947.
Poh, C. Q. X., C. U. Ubeynarayana, and Y. M. Goh. 2018. “Safety leading indicators for construction sites: A machine learning approach.” Autom. Constr. 93 (Mar): 375–386. https://doi.org/10.1016/j.autcon.2018.03.022.
Pour Rahimian, F., S. Seyedzadeh, S. Oliver, S. Rodriguez, and N. Dawood. 2020. “On-demand monitoring of construction projects through a game-like hybrid application of BIM and machine learning.” Autom. Constr. 110 (Feb): 14. https://doi.org/10.1016/j.autcon.2019.103012.
Rahman, A., Z. Y. Wu, and R. Kalfarisi. 2021. “Semantic deep learning integrated with RGB feature-Based rule optimization for facility surface corrosion detection and evaluation.” J. Comput. Civ. Eng. 35 (6): 04021018. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000982.
Redmon, J., S. Divvala, R. Girshick, and A. Farhadi. 2016. “You only look once: Unified, real-time object detection.” In Proc., Cvpr IEEE, 779–788. New York: IEEE.
Rennie, S. J., E. Marcheret, Y. Mroueh, J. Ross, and V. Goel. 2016. “Self-critical sequence training for image captioning.” In Proc., IEEE Conf. on Computer Vision and Pattern Recognition, 7008–7024. New York: IEEE.
Ryu, J., A. Alwasel, C. T. Haas, and E. Abdel-Rahman. 2020. “Analysis of relationships between body load and training, work methods, and work rate: Overcoming the Novice Mason’s risk hump.” J. Constr. Eng. Manage. 146 (8): 04020097. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001889.
Simonyan, K., and A. Zisserman. 2014. “Very deep convolutional networks for large-scale image recognition.” Preprint, submitted September 4 2014. https://arxiv.org/abs/1409.1556.
Sun, S., C. Luo, and J. Chen. 2017. “A review of natural language processing techniques for opinion mining systems.” Inf. Fusion 36 (Apr): 10–25. https://doi.org/10.1016/j.inffus.2016.10.004.
Szegedy, C., W. Liu, Y. Jia, P. Sermanet, and A. Rabinovich. 2014. “Going deeper with convolutions.” In Proc., IEEE Conf. on Computer Vision and Pattern Recognition, 1–9. New York: IEEE.
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. “Attention is all you need.” Adv. Neur. Inf. Process Syst. 2017 (1): 30. https://doi.org/10.48550/arXiv.1706.03762.
Wang, C. F., S. E. Antos, and L. M. Triveno. 2021. “Automatic detection of unreinforced masonry buildings from street view images using deep learning-based image segmentation.” Autom. Constr. 132 (Apr): 103968. https://doi.org/10.1016/j.autcon.2021.103968.
Xiao, B., H. R. Xiao, J. W. Wang, and Y. Chen. 2022. “Vision-based method for tracking workers by integrating deep learning instance segmentation in off-site construction.” Autom. Constr. 136 (Feb): 104148. https://doi.org/10.1016/j.autcon.2022.104148.
Xiao, B., and Z. H. Zhu. 2018. “Two-dimensional visual tracking in construction scenarios: A comparative study.” J. Comput. Civ. Eng. 32 (3): 4018006. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000738.
Xu, J., and L. Ding. 2017. “A review of metro construction in China: Organization, market, cost, safety, and schedule.” Front. Eng. Manage. 4 (1): 4–19. https://doi.org/10.15302/j-fem-2017015.
Xu, J. Q., and H. S. Yoon. 2019. “Vision-based estimation of excavator manipulator pose for automated grading control.” Autom. Constr. 98 (4): 122–131. https://doi.org/10.1016/j.autcon.2018.11.022.
Xu, K., J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. 2015. “Show, attend and tell: Neural image caption generation with visual attention.” Proc. Mach. Learn Res. 37 (4): 2048–2057. https://doi.org/10.48550/arXiv.1502.03044.
Yan, X. Z., H. Li, C. Wang, J. Seo, H. Zhang, and H. W. Wang. 2017. “Development of ergonomic posture recognition technique based on 2D ordinary camera for construction hazard prevention through view-invariant features in 2D skeleton motion.” Adv. Eng. Inf. 34 (11): 152–163. https://doi.org/10.1016/j.aei.2017.11.001.
Yang, J., Z. K. Shi, and Z. Y. Wu. 2016. “Vision-based action recognition of construction workers using dense trajectories.” Adv. Eng. Inf. 30 (3): 327–336. https://doi.org/10.1016/j.aei.2016.04.009.
Young, P., A. Lai, M. Hodosh, and J. Hockenmaier. 2014. “From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions.” Trans. Assoc. Comput. Ling. 2 (Dec): 67–78. https://doi.org/10.1162/tacl_a_00166.
Yu, Y., W. Umer, X. Yang, and M. F. Antwi-Afari. 2021. “Posture-related data collection methods for construction workers: A review.” Autom. Constr. 124 (Apr): 103538. https://doi.org/10.1016/j.autcon.2020.103538.
Yu, Y. T., H. Li, W. Umer, C. Dong, X. C. Yang, M. Skitmore, and A. Y. L. Wong. 2019. “Automatic biomechanical workload estimation for construction workers by computer vision and smart insoles.” J. Comput. Civ. Eng. 33 (3): 04019010. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000827.
Zhang, H., X. Z. Yan, and H. Li. 2018. “Ergonomic posture recognition using 3D view-invariant features from single ordinary camera.” Autom. Constr. 94 (4): 1–10. https://doi.org/10.1016/j.autcon.2018.05.033.
Zhang, M., and S. Ge. 2022. “Vision and trajectory–Based dynamic collision prewarning mechanism for tower cranes.” J. Constr. Eng. Manage. 148 (7): 04022057. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002309.

Information & Authors

Information

Published In

Go to Journal of Construction Engineering and Management
Journal of Construction Engineering and Management
Volume 149Issue 2February 2023

History

Received: Nov 28, 2021
Accepted: Sep 15, 2022
Published online: Nov 23, 2022
Published in print: Feb 1, 2023
Discussion open until: Apr 23, 2023

Permissions

Request permissions for this article.

Authors

Affiliations

Graduate Student, Dept. of Civil Engineering, Ocean Univ. of China, Qingdao 266100, China. ORCID: https://orcid.org/0000-0003-4247-0951. Email: [email protected]
Lecturer, Dept. of Civil Engineering, Ocean Univ. of China, Qingdao 266100, China (corresponding author). ORCID: https://orcid.org/0000-0002-1122-3336. Email: [email protected]
Graduate Student, Dept. of Civil Engineering, Ocean Univ. of China, Qingdao 266100, China. ORCID: https://orcid.org/0000-0002-3937-0290. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

  • Explainable Image Captioning to Identify Ergonomic Problems and Solutions for Construction Workers, Journal of Computing in Civil Engineering, 10.1061/JCCEE5.CPENG-5744, 38, 4, (2024).
  • Bi-Directional Image-to-Text Mapping for NLP-Based Schedule Generation and Computer Vision Progress Monitoring, Construction Research Congress 2024, 10.1061/9780784485262.084, (826-835), (2024).

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share