Fine-Tuning Vision Transformer (ViT) to Classify Highway Construction Workers’ Activities
Publication: Construction Research Congress 2024
ABSTRACT
Accurate construction workers’ activity detection distinguishes workers’ activity efficiency and identifies activities with high risks, which enhances construction productivity and safety management. Automated construction workers’ activity has become feasible in the construction industry since the emerging of computer vision techniques. However, few studies explored workers’ activity detection in transportation-related work zones (e.g., mobile work zone operations), which have their unique characteristics and different requirements than building construction jobsites. Previous studies in the construction domain usually used convolutional neural networks (CNN) for computer vision-related tasks. The transformer-based model achieved higher performance in computer vision tasks after it was first applied in 2020. However, few studies have applied the transformer model for worker activity identification. Therefore, this study aims to detect construction workers’ activities in mobile work zones using a pre-trained Vision Transformer (ViT) model. This study starts with the video data collection of construction workers. Then, a dataset containing different activities is developed by manual labeling. Next, the ViT model is fine-tuned using the dataset developed in this study. The results show the model has 94.17% overall accuracy and achieves 100%, 100%, and 84% precision in detecting “placing mix,” “shoveling,” and “walking,” respectively, and outperforms a CNN-based classification model.
Get full access to this article
View all available purchase options and get full access to this chapter.
REFERENCES
Abacha, A. B., and Zweigenbaum, P. (2015). MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies. Information processing and management, 51(5), 570–594.
Anwar, T., and Zakir, S. (2020, November). Deep learning based diagnosis of COVID-19 using chest CT-scan images. In 2020 IEEE 23rd international multitopic conference (INMIC) (pp. 1–5). IEEE.
Asadi Shamsabadi, E., Xu, C., Rao, A. S., Nguyen, T., Ngo, T., and Dias-da-Costa, D. (2022). Vision transformer-based autonomous crack detection on asphalt and concrete surfaces. Automation in Construction, 140. https://doi.org/10.1016/j.autcon.2022.104316.
Bao, J., Hu, X., Jiang, Y., and Li, S. (2020). A Convolutional Neural Network Model for Identifying Unclassified and Misclassified Vehicles Using Spatial Pyramid Pooling. Construction Research Congress 2022.
Bhokare, S., Goyal, L., Ren, R., and Zhang, J. (2022). Smart construction scheduling monitoring using YOLOv3-based activity detection and classification. Journal of Information Technology in Construction, 27, 240–252. https://doi.org/10.36680/j.itcon.2022.012.
CPWR-The Center for Construction Research and Training. (2023). Musculoskeletal Disorders in Construction. Retrieved on August 7th 2023 https://www.cpwr.com/research/data-center/data-dashboards/musculoskeletal-disorders-in-construction/.
da Costa, A. Z., Figueroa, H. E. H., and Fracarolli, J. A. (2020). Computer vision based detection of external defects on tomatoes using deep learning. Biosystems Engineering, 190, 131–144. https://doi.org/10.1016/j.biosystemseng.2019.12.003.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805.
Dosovitskiy, A., et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. http://arxiv.org/abs/2010.11929.
Jain, S. M. (2022). Hugging Face. In Introduction to Transformers for NLP (pp. 51–67). Apress. https://doi.org/10.1007/978-1-4842-8844-3_4.
Javanmardi, S., Miraei Ashtiani, S. H., Verbeek, F. J., and Martynenko, A. (2021). Computer-vision classification of corn seed varieties using deep convolutional neural network. Journal of Stored Products Research, 92. https://doi.org/10.1016/j.jspr.2021.101800.
Kim, K., and Cho, Y. K (2021). Automatic Recognition of Workers’ Motions in Highway Construction by Using Motion Sensors and Long Short-Term Memory Networks. https://doi.org/10.1061/(ASCE).
Kim, Y., Song, K., and Kang, K. (2022). Framework for Machine Learning-Based Pavement Marking Inspection and Geohash-Based Monitoring. International Conference on Transportation and Development 2022.
Lee, S. H., Lee, S., and Song, B. C. (2021). Vision Transformer for Small-Size Datasets. http://arxiv.org/abs/2112.13492.
Luo, H., Xiong, C., Fang, W., Love, P. E. D., Zhang, B., and Ouyang, X. (2018). Convolutional neural networks: Computer vision-based workforce activity assessment in construction. Automation in Construction, 94, 282–289. https://doi.org/10.1016/j.autcon.2018.06.007.
McKinsey Global Institute. (2017). Reinventing construction through a productivity revolution. https://www.mckinsey.com/industries/capital-projects-and-infrastructure/our-insights/reinventing-construction-through-a-productivity-revolution.
Moon, S., Chi, S., and Im, S. B. (2022). Automated detection of contractual risk clauses from construction specifications using bidirectional encoder representations from transformers (BERT). Automation in Construction, 142. https://doi.org/10.1016/j.autcon.2022.104465.
Panahi, R., Louis, J., Aziere, N., Podder, A., and Swanson, C. (2021). Identifying Modular Construction Worker Tasks Using Computer Vision. Computing in Civil Engineering.
Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving language understanding by generative pre-training.
Ren, R. G., Zhang, J., and Tang, P. (2023). An Extensible Construction Ontology to Guide Job-Site Sensing and Support Information Management. Proc., 40th International Symposium on Automation and Robotics in Construction (ISARC 2023). International: The International Association for Automation and Robotics in Construction.
Ren, R., Li, H., Han, T., Tian, C., Zhang, C., Zhang, J., Proctor, R., Chen, Y., and Feng, Y. (2023). “Vehicle Crash Simulations for Safety: Introduction of Connected and Automated Vehicles on the Roadways.” Accident Analysis and Prevention, 186(June 2023), 1–13.
Tian, C., Chen, Y., Feng, Y., and Zhang, J. (2022). Worker Activity Classification using Multimodal Data Fusion from Wearable Sensors. 19th International Conference on Computing in Civil and Building Engineering.
Tian, C., Kang, K., Zheng, Y., Song, K., and Debs, L. (2023). Feasibility of Low-Cost 3D Reconstruction of Small Infrastructure Assets: A Case Study of Fire Hydrants. ASCE International Conference on Computing in Civil Engineering 2023 (Accepted).
Tian, C., Wu, H., Chen, Y., Zhang, J., and Feng, Y. (2023). Exploration of Latent Themes in Truck-Mounted Attenuator (TMA) Related Accidents using Natural Language Processing. ASCE International Conference on Computing in Civil Engineering 2023 (Accepted).
Tian, C., Xiao, J., Chen, Y., Feng, Y., and Zhang, J. (2022). Implementation, Benefits, and Challenges of Autonomous Truck-Mounted Attenuator. International Conference on Transportation and Development.
Tian, Y., Li, H., Cui, H., and Chen, J. (2022). Construction motion data library: an integrated motion dataset for on-site activity recognition. Scientific Data, 9(1). https://doi.org/10.1038/s41597-022-01841-1.
Torabi, G., Hammad, A., and Bouguila, N. (2022). Two-Dimensional and Three-Dimensional CNN-Based Simultaneous Detection and Activity Classification of Construction Workers. Journal of Computing in Civil Engineering, 36(4). https://doi.org/10.1061/(asce)cp.1943-5487.0001024.
Vaswani, A., Brain, G., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention Is All You Need.
Xue, X., Hou, Y., and Zhang, J. (2022). “Automated construction contract summarization using natural language processing and deep learning.” Proc. 39th Intl. Symposium on Automation and Robotics in Construction (ISARC 2022), I.A.A.R.C., iaarc.org., 459–466.
Xue, X., and Zhang, J. (2021). “Part-of-speech tagging of building codes empowered by deep learning and transformational rules.” J. Adv. Eng. Inform., 47(January 2021), 101235.
Yang, M., Wu, C., Guo, Y., Jiang, R., Zhou, F., Zhang, J., and Yang, Z. (2023). Transformer-based deep learning model and video dataset for unsafe action identification in construction projects. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104703.
Yu, W., and Nishio, M. (2022). Multilevel Structural Components Detection and Segmentation toward Computer Vision‐Based Bridge Inspection. Sensors, 22(9). https://doi.org/10.3390/s22093502.
Zhang, R., and El-Gohary, N. (2023). Transformer-based approach for automated context-aware IFC-regulation semantic information alignment. Automation in Construction, 145. https://doi.org/10.1016/j.autcon.2022.104540.
Zhou, Z., Shin, J., Zhang, L., Gurudu, S., Gotway, M., and Liang, J. (2017). Fine-tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Information & Authors
Information
Published In
History
Published online: Mar 18, 2024
ASCE Technical Topics:
- Business management
- Computer models
- Computer vision and image processing
- Construction engineering
- Construction industry
- Construction management
- Construction sites
- Employment
- Engineering fundamentals
- Infrastructure construction
- Labor
- Methodology (by type)
- Model accuracy
- Models (by type)
- Occupational safety
- Personnel management
- Practice and Profession
- Public administration
- Public health and safety
- Safety
- Work zones
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.