Chapter
Mar 18, 2024

Fine-Tuning Vision Transformer (ViT) to Classify Highway Construction Workers’ Activities

Publication: Construction Research Congress 2024

ABSTRACT

Accurate construction workers’ activity detection distinguishes workers’ activity efficiency and identifies activities with high risks, which enhances construction productivity and safety management. Automated construction workers’ activity has become feasible in the construction industry since the emerging of computer vision techniques. However, few studies explored workers’ activity detection in transportation-related work zones (e.g., mobile work zone operations), which have their unique characteristics and different requirements than building construction jobsites. Previous studies in the construction domain usually used convolutional neural networks (CNN) for computer vision-related tasks. The transformer-based model achieved higher performance in computer vision tasks after it was first applied in 2020. However, few studies have applied the transformer model for worker activity identification. Therefore, this study aims to detect construction workers’ activities in mobile work zones using a pre-trained Vision Transformer (ViT) model. This study starts with the video data collection of construction workers. Then, a dataset containing different activities is developed by manual labeling. Next, the ViT model is fine-tuned using the dataset developed in this study. The results show the model has 94.17% overall accuracy and achieves 100%, 100%, and 84% precision in detecting “placing mix,” “shoveling,” and “walking,” respectively, and outperforms a CNN-based classification model.

Get full access to this article

View all available purchase options and get full access to this chapter.

REFERENCES

Abacha, A. B., and Zweigenbaum, P. (2015). MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies. Information processing and management, 51(5), 570–594.
Anwar, T., and Zakir, S. (2020, November). Deep learning based diagnosis of COVID-19 using chest CT-scan images. In 2020 IEEE 23rd international multitopic conference (INMIC) (pp. 1–5). IEEE.
Asadi Shamsabadi, E., Xu, C., Rao, A. S., Nguyen, T., Ngo, T., and Dias-da-Costa, D. (2022). Vision transformer-based autonomous crack detection on asphalt and concrete surfaces. Automation in Construction, 140. https://doi.org/10.1016/j.autcon.2022.104316.
Bao, J., Hu, X., Jiang, Y., and Li, S. (2020). A Convolutional Neural Network Model for Identifying Unclassified and Misclassified Vehicles Using Spatial Pyramid Pooling. Construction Research Congress 2022.
Bhokare, S., Goyal, L., Ren, R., and Zhang, J. (2022). Smart construction scheduling monitoring using YOLOv3-based activity detection and classification. Journal of Information Technology in Construction, 27, 240–252. https://doi.org/10.36680/j.itcon.2022.012.
CPWR-The Center for Construction Research and Training. (2023). Musculoskeletal Disorders in Construction. Retrieved on August 7th 2023 https://www.cpwr.com/research/data-center/data-dashboards/musculoskeletal-disorders-in-construction/.
da Costa, A. Z., Figueroa, H. E. H., and Fracarolli, J. A. (2020). Computer vision based detection of external defects on tomatoes using deep learning. Biosystems Engineering, 190, 131–144. https://doi.org/10.1016/j.biosystemseng.2019.12.003.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805.
Dosovitskiy, A., et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. http://arxiv.org/abs/2010.11929.
Jain, S. M. (2022). Hugging Face. In Introduction to Transformers for NLP (pp. 51–67). Apress. https://doi.org/10.1007/978-1-4842-8844-3_4.
Javanmardi, S., Miraei Ashtiani, S. H., Verbeek, F. J., and Martynenko, A. (2021). Computer-vision classification of corn seed varieties using deep convolutional neural network. Journal of Stored Products Research, 92. https://doi.org/10.1016/j.jspr.2021.101800.
Kim, K., and Cho, Y. K (2021). Automatic Recognition of Workers’ Motions in Highway Construction by Using Motion Sensors and Long Short-Term Memory Networks. https://doi.org/10.1061/(ASCE).
Kim, Y., Song, K., and Kang, K. (2022). Framework for Machine Learning-Based Pavement Marking Inspection and Geohash-Based Monitoring. International Conference on Transportation and Development 2022.
Lee, S. H., Lee, S., and Song, B. C. (2021). Vision Transformer for Small-Size Datasets. http://arxiv.org/abs/2112.13492.
Luo, H., Xiong, C., Fang, W., Love, P. E. D., Zhang, B., and Ouyang, X. (2018). Convolutional neural networks: Computer vision-based workforce activity assessment in construction. Automation in Construction, 94, 282–289. https://doi.org/10.1016/j.autcon.2018.06.007.
Moon, S., Chi, S., and Im, S. B. (2022). Automated detection of contractual risk clauses from construction specifications using bidirectional encoder representations from transformers (BERT). Automation in Construction, 142. https://doi.org/10.1016/j.autcon.2022.104465.
Panahi, R., Louis, J., Aziere, N., Podder, A., and Swanson, C. (2021). Identifying Modular Construction Worker Tasks Using Computer Vision. Computing in Civil Engineering.
Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving language understanding by generative pre-training.
Ren, R. G., Zhang, J., and Tang, P. (2023). An Extensible Construction Ontology to Guide Job-Site Sensing and Support Information Management. Proc., 40th International Symposium on Automation and Robotics in Construction (ISARC 2023). International: The International Association for Automation and Robotics in Construction.
Ren, R., Li, H., Han, T., Tian, C., Zhang, C., Zhang, J., Proctor, R., Chen, Y., and Feng, Y. (2023). “Vehicle Crash Simulations for Safety: Introduction of Connected and Automated Vehicles on the Roadways.” Accident Analysis and Prevention, 186(June 2023), 1–13.
Tian, C., Chen, Y., Feng, Y., and Zhang, J. (2022). Worker Activity Classification using Multimodal Data Fusion from Wearable Sensors. 19th International Conference on Computing in Civil and Building Engineering.
Tian, C., Kang, K., Zheng, Y., Song, K., and Debs, L. (2023). Feasibility of Low-Cost 3D Reconstruction of Small Infrastructure Assets: A Case Study of Fire Hydrants. ASCE International Conference on Computing in Civil Engineering 2023 (Accepted).
Tian, C., Wu, H., Chen, Y., Zhang, J., and Feng, Y. (2023). Exploration of Latent Themes in Truck-Mounted Attenuator (TMA) Related Accidents using Natural Language Processing. ASCE International Conference on Computing in Civil Engineering 2023 (Accepted).
Tian, C., Xiao, J., Chen, Y., Feng, Y., and Zhang, J. (2022). Implementation, Benefits, and Challenges of Autonomous Truck-Mounted Attenuator. International Conference on Transportation and Development.
Tian, Y., Li, H., Cui, H., and Chen, J. (2022). Construction motion data library: an integrated motion dataset for on-site activity recognition. Scientific Data, 9(1). https://doi.org/10.1038/s41597-022-01841-1.
Torabi, G., Hammad, A., and Bouguila, N. (2022). Two-Dimensional and Three-Dimensional CNN-Based Simultaneous Detection and Activity Classification of Construction Workers. Journal of Computing in Civil Engineering, 36(4). https://doi.org/10.1061/(asce)cp.1943-5487.0001024.
Vaswani, A., Brain, G., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention Is All You Need.
Xue, X., Hou, Y., and Zhang, J. (2022). “Automated construction contract summarization using natural language processing and deep learning.” Proc. 39th Intl. Symposium on Automation and Robotics in Construction (ISARC 2022), I.A.A.R.C., iaarc.org., 459–466.
Xue, X., and Zhang, J. (2021). “Part-of-speech tagging of building codes empowered by deep learning and transformational rules.” J. Adv. Eng. Inform., 47(January 2021), 101235.
Yang, M., Wu, C., Guo, Y., Jiang, R., Zhou, F., Zhang, J., and Yang, Z. (2023). Transformer-based deep learning model and video dataset for unsafe action identification in construction projects. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104703.
Yu, W., and Nishio, M. (2022). Multilevel Structural Components Detection and Segmentation toward Computer Vision‐Based Bridge Inspection. Sensors, 22(9). https://doi.org/10.3390/s22093502.
Zhang, R., and El-Gohary, N. (2023). Transformer-based approach for automated context-aware IFC-regulation semantic information alignment. Automation in Construction, 145. https://doi.org/10.1016/j.autcon.2022.104540.
Zhou, Z., Shin, J., Zhang, L., Gurudu, S., Gotway, M., and Liang, J. (2017). Fine-tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Information & Authors

Information

Published In

Go to Construction Research Congress 2024
Construction Research Congress 2024
Pages: 1140 - 1148

History

Published online: Mar 18, 2024

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Chi Tian, S.M.ASCE [email protected]
1Ph.D. Candidate, School of Construction Management Technology, Purdue Univ., West Lafayette, IN. Email: [email protected]
Yunfeng Chen, Ph.D. [email protected]
2Associate Professor, School of Construction Management Technology, Purdue Univ., West Lafayette, IN. Email: [email protected]
Yiheng Feng, Ph.D., M.ASCE [email protected]
3Assistant Professor, Lyles School of Civil Engineering, Purdue Univ., West Lafayette, IN. Email: [email protected]
Jiansong Zhang, Ph.D., A.M.ASCE [email protected]
4Associate Professor, School of Construction Management Technology, Purdue Univ., West Lafayette, IN. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Paper
$35.00
Add to cart
Buy E-book
$276.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Paper
$35.00
Add to cart
Buy E-book
$276.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share