Chapter
Mar 18, 2024

Automated Captioning for Ergonomic Problem and Solution Identification in Construction Using a Vision-Language Model and Caption Augmentation

Publication: Construction Research Congress 2024

ABSTRACT

Construction tasks impose high ergonomic risks, making it crucial to observe ergonomic problems (e.g., actions and postures associated with ergonomic risks) and provide solutions. As manually identifying problems and solutions is time-consuming and subjective, there has been extensive development toward automation through computer vision-based or sensor-based applications. Nevertheless, most existing studies have focused on assessing ergonomic risks, leaving tasks of recognizing problems and generating solutions to ergonomists. However, ergonomists are scarce in construction. Therefore, this study aims to automatically identify ergonomic problems and solutions from images by way of image captioning. To overcome limitations of traditional image captioning models, incapable of incorporating knowledge of ergonomics, this study applied a vision-language model (VLM) with caption augmentation leveraging text-based knowledge. The authors tested five work scenarios and showed superior performance of the proposed VLM over the traditional model. This result showed the feasibility of the proposed approach in identifying ergonomic problems and solutions.

Get full access to this article

View all available purchase options and get full access to this chapter.

REFERENCES

AlAfnan, M. A., S. Dishari, M. Jovic, and K. Lomidze. 2023. “ChatGPT as an Educational Tool: Opportunities, Challenges, and Recommendations for Communication, Business Writing, and Composition Courses.” JAIT. https://doi.org/10.37965/jait.2023.0184.
Albers, J. T., and C. F. Estill. 2007. Simple Solutions: Ergonomics for Construction Workers. NIOSH.
Beddiar, D., M. Oussalah, and S. Tapio. 2022. “Explainability for Medical Image Captioning.” 2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA), 1–6. Salzburg, Austria: IEEE.
Chen, X., H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. 2015. “Microsoft COCO Captions: Data Collection and Evaluation Server.” arXiv.
CPWR. 2018. The Construction Chart Book. Silver Spring: The Center for Construction Research and Training.
Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics.
Dosovitskiy, A., et al. 2021. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” arXiv.
He, K., X. Zhang, S. Ren, and J. Sun. 2015. “Deep Residual Learning for Image Recognition.” arXiv.
Hossain, M. Z., F. Sohel, M. F. Shiratuddin, and H. Laga. 2018. “A Comprehensive Survey of Deep Learning for Image Captioning.” arXiv.
Kunz, M., C. Shu, M. Picard, D. Vera, P. Hopkinson, and P. Xi. 2022. “Vision-based Ergonomic and Fatigue Analyses for Advanced Manufacturing.” 2022 IEEE 5th International Conference on Industrial Cyber-Physical Systems (ICPS), 01–07.
Lee, K., C. Jeon, and D. H. Shin. 2023. “Small Tool Image Database and Object Detection Approach for Indoor Construction Site Safety.” KSCE J Civ Eng, 27 (3): 930–939. https://doi.org/10.1007/s12205-023-1011-2.
Lee, Y.-C., and C.-H. Lee. 2022. “SEE: A proactive strategy-centric and deep learning-based ergonomic risk assessment system for risky posture recognition.” Advanced Engineering Informatics, 53: 101717. https://doi.org/10.1016/j.aei.2022.101717.
Li, J., D. Li, C. Xiong, and S. Hoi. 2022. “BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.” arXiv.
Liu, H., G. Wang, T. Huang, P. He, M. Skitmore, and X. Luo. 2020. “Manifesting construction activity scenes via image captioning.” Automation in Construction, 119: 103334. https://doi.org/10.1016/j.autcon.2020.103334.
Mutasa, S., S. Sun, and R. Ha. 2020. “Understanding artificial intelligence based radiology studies: What is overfitting?” Clinical Imaging, 65: 96–99. https://doi.org/10.1016/j.clinimag.2020.04.025.
OpenAI. 2023. “GPT-4 Technical Report.” arXiv.
OSHA. n.d. “Ergonomics - Overview | Occupational Safety and Health Administration.” Occupational Safety and Health Administration. Accessed March 27, 2023. https://www.osha.gov/ergonomics.
Papineni, K., S. Roukos, T. Ward, and W.-J. Zhu. 2002. “Bleu: a Method for Automatic Evaluation of Machine Translation.” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–318. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics.
Parsa, B., and A. G. Banerjee. 2021. “A Multi-Task Learning Approach for Human Activity Segmentation and Ergonomics Risk Assessment.” 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), 2351–2361. Waikoloa, HI, USA: IEEE.
Patial, R., H. Gusain, B. P. Yadav, and N. A. Siddiqui. 2023. “A Review of Ergonomic Risk Assessment Techniques Employed in Construction Industry.” Advances in Construction Safety, N. A. Siddiqui, B. P. Yadav, S. M. Tauseef, S. P. Garg, and E. R. Devendra Gill, eds., 117–131. Singapore: Springer Nature.
Roberts, D., W. Torres Calderon, S. Tang, and M. Golparvar-Fard. 2020. “Vision-Based Construction Worker Activity Analysis Informed by Body Posture.” J. Comput. Civ. Eng., 34 (4): 04020017. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000898.
Seo, J., and S. Lee. 2021. “Automated postural ergonomic risk assessment using vision-based posture classification.” Automation in Construction, 128: 103725. https://doi.org/10.1016/j.autcon.2021.103725.
Sneller, T. N., S. D. Choi, and K. Ahn. 2018. “Awareness and perceptions of ergonomic programs between workers and managers surveyed in the construction industry.” WOR, 61 (1): 41–54. https://doi.org/10.3233/WOR-182778.
Tang, S., and M. Golparvar-Fard. 2021. “Machine Learning-Based Risk Analysis for Construction Worker Safety from Ubiquitous Site Photos and Videos.” J. Comput. Civ. Eng., 35 (6): 04021020. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000979.
Tao, Y., H. Hu, F. Xu, and Z. Zhang. 2023. “Ergonomic Risk Assessment of Construction Workers and Projects Based on Fuzzy Bayesian Network and D-S Evidence Theory.” J. Constr. Eng. Manage., 149 (6): 04023034. https://doi.org/10.1061/JCEMD4.COENG-12821.
Tsai, W. L., J. J. Lin, and S.-H. Hsieh eds. 2023. “Generating Construction Safety Observations via CLIP-Based Image-Language Embedding.” In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, Lecture Notes in Computer Science, 366–381. Cham: Springer Nature Switzerland.
US BLS. 2020. “Number, incidence rate, and median days away from work of injuries and illnesses involving musculoskeletal disorders by selected industries, U.S., private sector, 2018.” U.S. BUREAU OF LABOR STATISTICS. Accessed March 27, 2023. https://www.bls.gov/iif/factsheets/msds-chart2-data.htm.
Van Der Molen, H. F., J. K. Sluiter, C. T. Hulshof, P. Vink, C. Van Duivenbooden, R. Holman, and M. H. Hw Frings-Dresen. 2005. “Implementation of participatory ergonomics intervention in construction companies.” Scand J Work Environ Health, 31 (3): 191–204. https://doi.org/10.5271/sjweh.869.
Vinyals, O., A. Toshev, S. Bengio, and D. Erhan. 2015. “Show and tell: A neural image caption generator.” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3156–3164. Boston, MA, USA: IEEE.
Wang, J., R. Cheng, M. Liu, and P.-C. Liao. 2021. “Research Trends of Human–Computer Interaction Studies in Construction Hazard Recognition: A Bibliometric Review.” Sensors, 21 (18): 6172. https://doi.org/10.3390/s21186172.
Wang, X., X. S. Dong, S. D. Choi, and J. Dement. 2017. “Work-related musculoskeletal disorders among construction workers in the United States from 1992 to 2014.” Occup Environ Med, 74 (5): 374–380. https://doi.org/10.1136/oemed-2016-103943.
Yu, Y., X. Yang, H. Li, X. Luo, H. Guo, and Q. Fang. 2019. “Joint-Level Vision-Based Ergonomic Assessment Tool for Construction Workers.” J. Constr. Eng. Manage., 145 (5): 04019025. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001647.
Zhai, P., J. Wang, and L. Zhang. 2023. “Extracting Worker Unsafe Behaviors from Construction Images Using Image Captioning with Deep Learning–Based Attention Mechanism.” J. Constr. Eng. Manage., 149 (2): 04022164. https://doi.org/10.1061/JCEMD4.COENG-12096.

Information & Authors

Information

Published In

Go to Construction Research Congress 2024
Construction Research Congress 2024
Pages: 709 - 718

History

Published online: Mar 18, 2024

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Gunwoo Yong [email protected]
1Dynamic Project Management Laboratory, Dept. of Civil and Environmental Engineering, Univ. of Michigan. Email: [email protected]
Meiyin Liu, Ph.D. [email protected]
2Assistant Professor, Dept. of Civil and Environmental Engineering, Rutgers Univ. Email: [email protected]
SangHyun Lee, Ph.D. [email protected]
3Professor, Dept. of Civil and Environmental Engineering, Univ. of Michigan. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Paper
$35.00
Add to cart
Buy E-book
$190.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Paper
$35.00
Add to cart
Buy E-book
$190.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share