Technical Papers
Mar 25, 2022

Two-Dimensional and Three-Dimensional CNN-Based Simultaneous Detection and Activity Classification of Construction Workers

Publication: Journal of Computing in Civil Engineering
Volume 36, Issue 4

Abstract

The type and duration of construction workers’ activities are useful information for project management purposes. Therefore, several studies have used surveillance cameras and computer vision to automate the time-consuming process of manually gathering this information. However, the three-stage method they have adopted consisting of separate detection, tracking, and activity classification modules is not fully optimized. Additionally, the activity classification module is trained per-clip/segment on trimmed video clips and fails when applied to long untrimmed construction videos. This paper aims to (1) investigate the benefits of a fully optimized method such as you only watch once (YOWO) and a per-frame and per-worker annotated untrimmed data set over the previous approach for activity recognition of construction workers; (2) propose an improved version of YOWO, called YOWO53, to improve detection performance; (3) propose a semiautomatic data set annotation; (4) conduct a sensitivity analysis to compare the performance of YOWO, YOWO53, and the three-stage method; and (5) conduct a case study to compute the percentage of different workers’ activities. YOWO53 improves the detection recall of YOWO by up to 3%, and the classification accuracy of the three-stage method by 16.3%. Although YOWO53 has a lower inference speed, it is still sufficiently fast for productivity analysis.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

The videos used in training and testing are confidential because the construction company does not allow them to be shared. The main network models used in this research are available from other sources (not created by the authors). The new code created by the authors in this study (e.g., the code the semiautomatic annotation) is available from the corresponding author upon reasonable request.

References

Aggarwal, J. K., and M. S. Ryoo. 2011. “Human activity analysis: A review.” ACM Comput. Surv. 43 (3): 1–43. https://doi.org/10.1145/1922649.1922653.
Ankerst, M., M. M. Breunig, H.-P. Kriegel, and J. Sander. 1999. “OPTICS: Ordering points to identify the clustering structure.” ACM SIGMOD Rec. 28 (2): 49–60. https://doi.org/10.1145/304181.304187.
Chaquet, J. M., E. J. Carmona, and A. Fernández-Caballero. 2013. “A survey of video datasets for human action and activity recognition.” Comput. Vision Image Understanding 117 (6): 633–659. https://doi.org/10.1016/j.cviu.2013.01.013.
Chen, C., Z. Zhu, and A. Hammad. 2020. “Automated excavators’ activity recognition and productivity analysis from construction site surveillance videos.” Autom. Constr. 110 (Feb): 103045. https://doi.org/10.1016/j.autcon.2019.103045.
Everingham, M., L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. 2010. “The Pascal visual object classes (voc) challenge.” Int. J. Comput. Vision 88 (2): 303–338. https://doi.org/10.1007/s11263-009-0275-4.
Forney, G. D. 1973. “The Viterbi algorithm.” Proc. IEEE 61 (3): 268–278. https://doi.org/10.1109/PROC.1973.9030.
Girdhar, R., J. Joao Carreira, C. Doersch, and A. Zisserman. 2019. “Video action transformer network.” In Proc., 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 244–253. Piscataway, NJ: IEEE. https://doi.org/10.1109/CVPR.2019.00033.
Gu, C., et al. 2018. “AVA: A video dataset of spatio-temporally localized atomic visual actions.” Preprint, submitted May 23, 2017. http://arxiv.org/abs/1705.08421.
Hara, K., H. Kataoka, and Y. Satoh. 2018. “Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?” In Proc., 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 6546–6555. Washington, DC: IEEE Computer Society.
Heilbron, F. C., V. Escorcia, B. Ghanem, and J. C. Niebles. 2015. “ActivityNet: A large-scale video benchmark for human activity understanding.” In Proc., 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 961–970. Piscataway, NJ: IEEE. https://doi.org/10.1109/CVPR.2015.7298698.
Idrees, H., A. R. Zamir, Y.-G. Jiang, A. Gorban, I. Laptev, R. Sukthankar, and M. Shah. 2017. “The THUMOS challenge on action recognition for videos “in the wild.” Comput. Vision Image Understanding 155 (Feb): 1–23. https://doi.org/10.1016/j.cviu.2016.10.018.
Kahatapitiya, K., Z. Ren, H. Li, Z. Wu, and M. S. Ryoo. 2021. “Self-supervised pretraining with classification labels for temporal activity detection.” Preprint, submitted November 26, 2021. http://arxiv.org/abs/2111.13675.
Kalogeiton, V., P. Weinzaepfel, V. Ferrari, and C. Schmid. 2017. “Action tubelet detector for spatio-temporal action localization.” Preprint, submitted May 4, 2017. http://arxiv.org/abs/1705.01861.
Kay, W., et al. 2017. “The kinetics human action video dataset.” Preprint, submitted May 19, 2017. http://arxiv.org/abs/1705.06950.
Kim, J., and S. Chi. 2019. “Action recognition of earthmoving excavators based on sequential pattern analysis of visual features and operation cycles.” Autom. Constr. 104 (Aug): 255–264. https://doi.org/10.1016/j.autcon.2019.03.025.
Kopuklu, O., N. Kose, A. Gunduz, and G. Rigoll. 2019. “Resource efficient 3D convolutional neural networks.” In Proc., 2019 IEEE/CVF Int. Conf. on Computer Vision Workshops, 1910–1919. Washington, DC: IEEE Computer Society. https://doi.org/10.1109/ICCVW.2019.00240.
Köpüklü, O., X. Wei, and G. Rigoll. 2020. “You only watch once: A unified CNN architecture for real-time spatiotemporal action localization.” Preprint, submitted November 15, 2019. http://arxiv.org/abs/1911.06644.
Kuehne, H., H. Jhuang, E. Garrote, T. Poggio, and T. Serre. 2011. “HMDB: A large video database for human motion recognition.” In Proc., 2011 IEEE Int. Conf. on Computer Vision (ICCV), 2556–2563. Washington, DC: IEEE Computer Society.
Le, H., and A. Borji. 2018. “What are the receptive, effective receptive, and projective fields of neurons in convolutional neural networks?” Preprint, submitted May 19, 2017. http://arxiv.org/abs/1705.07049.
Lin, T.-Y., M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár. 2015. “Microsoft COCO: Common objects in context.” Preprint, submitted by May 1, 2014. https://arxiv.org/abs/1405.0312.
Liu, W., D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. 2016. “SSD: Single shot multiBox detector.” Preprint, submitted Decemebr 8, 2015. http://arxiv.org/abs/1512.02325.
Luo, X., H. Li, D. Cao, F. Dai, J. Seo, and S. Lee. 2018a. “Recognizing diverse construction activities in site images via relevance networks of construction-related objects detected by convolutional neural networks.” J. Comput. Civ. Eng. 32 (3): 04018012. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000756.
Luo, X., H. Li, D. Cao, Y. Yu, X. Yang, and T. Huang. 2018b. “Towards efficient and objective work sampling: Recognizing workers’ activities in site surveillance videos with two-stream convolutional networks.” Autom. Constr. 94 (Oct): 360–370. https://doi.org/10.1016/j.autcon.2018.07.011.
Luo, X., H. Li, H. Wang, Z. Wu, F. Dai, and D. Cao. 2019. “Vision-based detection and visualization of dynamic workspaces.” Autom. Constr. 104 (Aug): 1–13. https://doi.org/10.1016/j.autcon.2019.04.001.
Luo, X., H. Li, Y. Yu, C. Zhou, and D. Cao. 2020. “Combining deep features and activity context to improve recognition of activities of workers in groups.” Comput.-Aided Civ. Infrastruct. Eng. 35 (9): 965–978. https://doi.org/10.1111/mice.12538.
Moltisanti, D., S. Fidler, and D. Damen. 2019. “Action recognition from single timestamp supervision in untrimmed videos.” In Proc., 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 9907–9916. Piscataway, NJ: IEEE. https://doi.org/10.1109/CVPR.2019.01015.
Pan, J., S. Chen, Z. Shou, J. Shao, and H. Li. 2020. “Actor-context-actor relation network for spatio-temporal action localization.” Preprint, submitted June 14, 2020. http://arxiv.org/abs/2006.07976.
Parmenter, D. 2020. Key performance indicators: Developing, implementing, and using winning KPIs. 4th ed. Hoboken, NJ: Wiley.
Redmon, J. 2021. “Darknet: Open source neural networks in C.” Accessed March 13, 2022. https://pjreddie.com/darknet/.
Redmon, J., and A. Farhadi. 2016. “YOLO9000: Better, faster, stronger.” Preprint, submitted December 15, 2016. http://arxiv.org/abs/1612.08242.
Redmon, J., and A. Farhadi. 2018. “YOLOv3: An incremental improvement.” Preprint, submitted April 8, 2018. http://arxiv.org/abs/1804.02767.
Ren, S., K. He, R. Girshick, and J. Sun. 2017. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In Vol. 39 of Proc., IEEE Transactions on Pattern Analysis and Machine Intelligence, 1137–1149. New York: IEEE. https://doi.org/10.1109/TPAMI.2016.2577031.
Roberts, D., and M. Golparvar-Fard. 2019. “End-to-end vision-based detection, tracking and activity analysis of earthmoving equipment filmed at ground level.” Autom. Constr. 105 (Sep): 102811. https://doi.org/10.1016/j.autcon.2019.04.006.
Sherafat, B., C. R. Ahn, R. Akhavian, A. H. Behzadan, M. Golparvar-Fard, H. Kim, Y.-C. Lee, A. Rashidi, and E. R. Azar. 2020. “Automated methods for activity recognition of construction workers and equipment state-of-the-art review.” J. Constr. Eng. Manage. 146 (6): 03120002. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001843.
Sigurdsson, G. A., G. Varol, X. Wang, A. Farhadi, I. Laptev, and A. Gupta. 2016. “Hollywood in homes: Crowdsourcing data collection for activity understanding.” Preprint, submitted April 6, 2016. http://arxiv.org/abs/1604.01753.
Song, L., S. Zhang, G. Yu, and H. Sun. 2019. “TACNet: Transition-aware context network for spatio-temporal action detection.” In Proc., 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 11979–11987. Piscataway, NJ: IEEE. https://doi.org/10.1109/CVPR.2019.01226.
Soomro, K., A. R. Zamir, and M. Shah. 2012. “UCF101: A dataset of 101 human actions classes from videos in the wild.” Preprint, submitted April 6, 2016. http://arxiv.org/abs/1212.0402.
Tao, L., X. Wang, and T. Yamasaki. 2020. “Self-supervised video representation learning using inter-intra contrastive framework.” In Proc., 28th ACM Int. Conf. on Multimedia, 2193–2201. New York: Association for Computing Machinery. https://doi.org/10.1145/3394171.3413694.
Torabi, G., A. Hammad, and N. Bouguila. 2021. “Joint detection and activity recognition of construction workers using convolutional neural networks.” In Proc., 2021 European Conf. on Computing in Construction, 212–219. Dublin, Ireland: Univ. College Dublin. https://doi.org/10.35490/ec3.2021.197.
Varol, G., I. Laptev, and C. Schmid. 2018. “Long-term temporal convolutions for action recognition.” IEEE Trans. Pattern Anal. Mach. Intell. 40 (6): 1510–1517. https://doi.org/10.1109/TPAMI.2017.2712608.
Wang, L., Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool. 2017. “Temporal segment networks for action recognition in videos.” Preprint, submitted May 8, 2017. http://arxiv.org/abs/1705.02953.
Wojke, N., A. Bewley, and D. Paulus. 2017. “Simple online and realtime tracking with a deep association metric.” In Proc., 2017 IEEE Int. Conf. on Image Processing (ICIP), 3645–3649. New York: IEEE.
Yang, X., X. Yang, M.-Y. Liu, F. Xiao, L. Davis, and J. Kautz. 2019. “STEP: Spatio-temporal progressive learning for video action detection.” Preprint, submitted April 19, 2019. http://arxiv.org/abs/1904.09288.
Yao, G., T. Lei, X. Liu, and P. Jiang. 2018. “Temporal action detection in untrimmed videos from fine to coarse granularity.” Appl. Sci. 8 (10): 1924. https://doi.org/10.3390/app8101924.
Yeung, S., O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and L. Fei-Fei. 2018. “Every moment counts: Dense detailed labeling of actions in complex videos.” Int. J. Comput. Vision 126 (2): 375–389. https://doi.org/10.1007/s11263-017-1013-y.

Information & Authors

Information

Published In

Go to Journal of Computing in Civil Engineering
Journal of Computing in Civil Engineering
Volume 36Issue 4July 2022

History

Received: Oct 12, 2021
Accepted: Jan 27, 2022
Published online: Mar 25, 2022
Published in print: Jul 1, 2022
Discussion open until: Aug 25, 2022

Permissions

Request permissions for this article.

Authors

Affiliations

Graduate Student, Dept. of Electrical and Computer Engineering, Concordia Univ., Montreal, QC, Canada H3G 2W1. ORCID: https://orcid.org/0000-0001-5641-5801. Email: [email protected]
Professor, Concordia Institute for Information Systems Engineering, Concordia Univ., Montreal, QC, Canada H3G 2W1 (corresponding author). ORCID: https://orcid.org/0000-0002-2507-4976. Email: [email protected]
Professor, Concordia Institute for Information Systems Engineering, Concordia Univ., Montreal, QC, Canada H3G 2W1. ORCID: https://orcid.org/0000-0001-7224-7940. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share