Fine-Tuning Vision Transformer (ViT) to Classify Highway Construction Workers’ Activities

Tian, Chi; Chen, Yunfeng; Feng, Yiheng; Zhang, Jiansong

doi:10.1061/9780784485262.116

Chapter

Mar 18, 2024

Fine-Tuning Vision Transformer (ViT) to Classify Highway Construction Workers’ Activities

Authors: Chi Tian, S.M.ASCE [email protected], Yunfeng Chen, Ph.D. [email protected], Yiheng Feng, Ph.D., M.ASCE [email protected], and Jiansong Zhang, Ph.D., A.M.ASCE [email protected]Author Affiliations

Publication: Construction Research Congress 2024

https://doi.org/10.1061/9780784485262.116

Get Access

ABSTRACT

Accurate construction workers’ activity detection distinguishes workers’ activity efficiency and identifies activities with high risks, which enhances construction productivity and safety management. Automated construction workers’ activity has become feasible in the construction industry since the emerging of computer vision techniques. However, few studies explored workers’ activity detection in transportation-related work zones (e.g., mobile work zone operations), which have their unique characteristics and different requirements than building construction jobsites. Previous studies in the construction domain usually used convolutional neural networks (CNN) for computer vision-related tasks. The transformer-based model achieved higher performance in computer vision tasks after it was first applied in 2020. However, few studies have applied the transformer model for worker activity identification. Therefore, this study aims to detect construction workers’ activities in mobile work zones using a pre-trained Vision Transformer (ViT) model. This study starts with the video data collection of construction workers. Then, a dataset containing different activities is developed by manual labeling. Next, the ViT model is fine-tuned using the dataset developed in this study. The results show the model has 94.17% overall accuracy and achieves 100%, 100%, and 84% precision in detecting “placing mix,” “shoveling,” and “walking,” respectively, and outperforms a CNN-based classification model.

Get full access to this article

View all available purchase options and get full access to this chapter.

Get Access

REFERENCES

Abacha, A. B., and Zweigenbaum, P. (2015). MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies. Information processing and management, 51(5), 570–594.

ABSTRACT

Get full access to this article

REFERENCES

Information

Published In

History

Permissions

ASCE Technical Topics:

Authors

Affiliations

Metrics

Citations

Download citation

Get Access

Access content

Purchase

ASCE Library Card (5 downloads)

ASCE Library Card (5 downloads)

ASCE Library Card (20 downloads)

ASCE Library Card (20 downloads)

Buy Single Paper

Buy Single Paper

Buy E-book

Buy E-book

Get Access

Access content

Purchase

ASCE Library Card (5 downloads)

ASCE Library Card (5 downloads)

ASCE Library Card (20 downloads)

ASCE Library Card (20 downloads)

Buy Single Paper

Buy Single Paper

Buy E-book

Buy E-book

Figures

Other

Share

Copy the content Link

Share with email

Share

Request Username

Create a new account

Change Password

Password Changed Successfully

Verify Phone

Congrats!