Technical Papers
Apr 29, 2021

Multitask Learning Method for Detecting the Visual Focus of Attention of Construction Workers

Publication: Journal of Construction Engineering and Management
Volume 147, Issue 7

Abstract

The visual focus of attention (VFOA) of construction workers is a critical cue for recognizing entity interactions, which in turn facilitates the interpretation of workers’ intentions, the prediction of movements, and the comprehension of the jobsite context. The increasing use of construction surveillance cameras provides a cost-efficient way to estimate workers’ VFOA from information-rich images. However, the low resolution of these images poses a great challenge to detecting the facial features and gaze directions. Recognizing that body and head orientations provide strong hints to infer workers’ VFOA, this study proposes to represent the VFOA as a collection of body orientations, body poses, head yaws, and head pitches and designs a convolutional neural network (CNN)-based multitask learning (MTL) framework to automatically estimate workers’ VFOA using low-resolution construction images. The framework is composed of two modules. In the first module, a Faster regional CNN (R-CNN) object detector is used to detect and extract workers’ full-body images, and the resulting full-body images serve as a single input to the CNN-MTL model in the second module. In the second module, the VFOA estimation is formulated as a multitask image classification problem where four classification tasks—body orientation, body pose, head yaw, and head pitch—are jointly learned by the newly designed CNN-MTL model. Construction videos were used to train and test the proposed framework. The results show that the proposed CNN-MTL model achieves an accuracy of 0.91, 0.95, 0.86, and 0.83 in body orientation, body pose, head yaw, and head pitch classification, respectively. Compared with the conventional single-task learning, the MTL method reduces training time by almost 50% without compromising accuracy.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some or all data, models, or code generated or used during the study are available from the corresponding author by request, including Python codes for data processing and multitask image classification.

Acknowledgments

This research was partially funded by the US National Science Foundation (NSF) via Grant Nos. 1850008 and 2038967. The authors gratefully acknowledge NSF’s support. Any opinions, findings, recommendations, and conclusions in this paper are those of the authors and do not necessarily reflect the views of NSF, the University of Texas at San Antonio, the University of Tennessee, Knoxville, and Purdue University.

References

Ahn, B., D. G. Choi, J. Park, and I. S. Kweon. 2018. “Real-time head pose estimation using multi-task deep neural network.” Rob. Auton. Syst. 103 (May): 1–12. https://doi.org/10.1016/j.robot.2018.01.005.
Asteriadis, S., K. Karpouzis, and S. Kollias. 2011. “Robust validation of visual focus of attention using adaptive fusion of head and eye gaze patterns.” In Proc., IEEE Int. Conf. on Computer Vision, 414–421. New York: IEEE.
Baxter, R. H., M. J. V. Leach, S. S. Mukherjee, and N. M. Robertson. 2015. “An adaptive motion model for person tracking with instantaneous head-pose features.” IEEE Signal Process Lett. 22 (5): 578–582. https://doi.org/10.1109/LSP.2014.2364458.
Cai, J., L. Yang, Y. Zhang, and H. Cai. 2020a. “Estimating the visual attention of construction workers from head pose using convolutional neural network-based multi-task learning.” In Proc., Construction Research Congress 2020: Computer Applications, 116–124. Denver: Construction Institute of ASCE.
Cai, J., Y. Zhang, and H. Cai. 2019. “Two-step long short-term memory method for identifying construction activities through positional and attentional cues” Autom. Constr. 106 (Oct): 102886. https://doi.org/10.1016/j.autcon.2019.102886.
Cai, J., Y. Zhang, L. Yang, H. Cai, and S. Li. 2020b. “A context-augmented deep learning approach for worker trajectory prediction on unstructured and dynamic construction sites.” Adv. Eng. Inf. 46 (Oct): 101173. https://doi.org/10.1016/j.aei.2020.101173.
Chamveha, I., Y. Sugano, Y. Sato, and A. Sugimoto. 2014. Social group discovery from surveillance videos: A data-driven approach with attention-based cues. Bristol, UK: British Machine Vision Conference.
Ding, L., W. Fang, H. Luo, P. E. D. Love, B. Zhong, and X. Ouyang. 2018. “A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long short-term memory.” Autom. Constr. 86 (Feb): 118–124. https://doi.org/10.1016/j.autcon.2017.11.002.
Escorcia, V., M. A. Dávila, M. Golparvar-Fard, and J. C. Niebles. 2012. “Automated vision-based recognition of construction worker actions for building interior construction operations using RGBD cameras.” In Proc., 2012 Construction Research Congress: Construction Challenges in a Flat World. Denver: Construction Institute of ASCE.
Fang, W., L. Ding, H. Luo, and P. E. D. Love. 2018a. “Falls from heights: A computer vision-based approach for safety harness detection.” Autom. Constr. 91 (Jul): 53–61. https://doi.org/10.1016/j.autcon.2018.02.018.
Fang, W., L. Ding, B. Zhong, P. E. D. Love, and H. Luo. 2018b. “Automated detection of workers and heavy equipment on construction sites: A convolutional neural network approach.” Adv. Eng. Inf. 37 (Aug): 139–149. https://doi.org/10.1016/j.aei.2018.05.003.
Flohr, F., M. Dumitru-Guzu, J. F. P. Kooij, and D. M. Gavrila. 2015. “A Probabilistic framework for joint pedestrian head and body orientation estimation.” IEEE Trans. Intell. Transp. Syst. 16 (4): 1872–1882. https://doi.org/10.1109/TITS.2014.2379441.
Garrett, J. W., and J. Teizer. 2009. “Human factors analysis classification system relating to human error awareness taxonomy in construction safety.” J. Constr. Eng. Manage. 135 (8): 754–763. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000034.
Gong, J., C. H. Caldas, and C. Gordon. 2011. “Learning and classifying actions of construction workers and equipment using Bag-of-Video-Feature-Words and Bayesian network models.” Adv. Eng. Inf. 25 (4): 771–782. https://doi.org/10.1016/j.aei.2011.06.002.
Hasanzadeh, S., B. Esmaeili, and M. D. Dodd. 2016. Measuring construction workers’ real-time situation awareness using mobile eye-tracking, 2894–2904. Denver: Construction Institute of ASCE.
Hasanzadeh, S., B. Esmaeili, and M. D. Dodd. 2017. “Impact of construction workers’ hazard identification skills on their visual attention.” J. Constr. Eng. Manage. 143 (10): 04017070. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001373.
Hasanzadeh, S., B. Esmaeili, and M. D. Dodd. 2018. “Examining the relationship between construction workers’ visual attention and situation awareness under fall and tripping hazard conditions: Using mobile eye tracking.” J. Constr. Eng. Manage. 144 (7): 04018060. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001516.
He, K., X. Zhang, S. Ren, and J. Sun. 2015. “Spatial pyramid pooling in deep convolutional networks for visual recognition.” IEEE Trans. Pattern Anal. Mach. Intell. 37 (9): 1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824.
He, K., X. Zhang, S. Ren, and J. Sun. 2016. “Deep residual learning for image recognition.” In Proc., IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, 770–778. New York: IEEE.
Jeelani, I., A. Albert, K. Han, and R. Azevedo. 2018. “Are visual search patterns predictive of hazard recognition performance? Empirical investigation using eye-tracking technology.” J. Constr. Eng. Manage. 145 (1): 04018115. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001589.
Kingma, D. P., and J. L. Ba. 2015. “Adam: A method for stochastic optimization.” In Proc., 3rd Int. Conf. on Learning Representations, ICLR 2015.
Lin, T. Y., M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. 2014. “Microsoft COCO: Common objects in context.” In Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Berlin: Springer.
Liu, H., and L. Ma. 2015. “Online person orientation estimation based on classifier update.” In Proc., Int. Conf. on Image Processing. New York: IEEE.
Liu, W., D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg. 2016. “SSD: Single shot multibox detector.” In Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Berlin: Springer.
Luo, H., C. Xiong, W. Fang, P. E. D. Love, B. Zhang, and X. Ouyang. 2018a. “Convolutional neural networks: Computer vision-based workforce activity assessment in construction.” Autom. Constr. 94 (Oct): 282–289. https://doi.org/10.1016/j.autcon.2018.06.007.
Luo, X., H. Li, D. Cao, F. Dai, J. Seo, and S. Lee. 2018b. “Recognizing diverse construction activities in site images via relevance networks of construction-related objects detected by convolutional neural networks.” J. Comput. Civ. Eng. 32 (3): 04018012. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000756.
Masse, B., S. Ba, and R. Horaud. 2018. “Tracking gaze and visual focus of attention of people involved in social interaction.” IEEE Trans. Pattern Anal. Mach. Intell. 40 (11): 2711–2724. https://doi.org/10.1109/TPAMI.2017.2782819.
Memarzadeh, M., M. Golparvar-Fard, and J. C. Niebles. 2013. “Automated 2D detection of construction equipment and workers from site video streams using histograms of oriented gradients and colors.” Autom. Constr. 32 (Jul): 24–37. https://doi.org/10.1016/j.autcon.2012.12.002.
Murphy-Chutorian, E., and M. M. Trivedi. 2009. “Head pose estimation in computer vision: A survey.” IEEE Trans. Pattern Anal. Mach. Intell. 31 (4): 607–626. https://doi.org/10.1109/TPAMI.2008.106.
Ozturk, O., T. Yamasaki, and K. Aizawa. 2011. “Estimating human body and head orientation change to detect visual attention direction.” In Proc., Computer Vision–ACCV 2010 Workshops, 410–419. Berlin: Springer.
Park, M. W., and I. Brilakis. 2012. “Construction worker detection in video frames for initializing vision trackers.” Autom. Constr. 28 (Dec): 15–25. https://doi.org/10.1016/j.autcon.2012.06.001.
Patacchiola, M., and A. Cangelosi. 2017. “Head pose estimation in the wild using Convolutional Neural Networks and adaptive gradient methods.” Pattern Recognit. 71 (Nov): 132–143. https://doi.org/10.1016/j.patcog.2017.06.009.
Ray, S. J., and J. Teizer. 2012. “Coarse head pose estimation of construction equipment operators to formulate dynamic blind spots.” Adv. Eng. Inf. 26 (1): 117–130. https://doi.org/10.1016/j.aei.2011.09.005.
Raza, M., Z. Chen, S. U. Rehman, P. Wang, and P. Bao. 2018. “Appearance based pedestrians’ head pose and body orientation estimation using deep learning.” Neurocomputing 272 (Jan): 647–659. https://doi.org/10.1016/j.neucom.2017.07.029.
Redmon, J., S. Divvala, R. Girshick, and A. Farhadi. 2016. “You only look once: Unified, real-time object detection.” In Proc., IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. New York: IEEE.
Rehder, E., H. Kloeden, and C. Stiller. 2014. “Head detection and orientation estimation for pedestrian safety.” In Proc., 17th IEEE Int. Conf. on Intelligent Transportation Systems, ITSC 2014, 2292–2297. New York: IEEE.
Ren, S., K. He, R. Girshick, and J. Sun. 2017. “Faster R-CNN: Towards real-time object detection with region proposal networks.” IEEE Trans. Pattern Anal. Mach. Intell. 39 (6): 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.
Roberts, D., and M. Golparvar-Fard. 2019. “End-to-end vision-based detection, tracking and activity analysis of earthmoving equipment filmed at ground level.” Autom. Constr. 105 (Sep): 102811. https://doi.org/10.1016/j.autcon.2019.04.006.
Ruder, S. 2017. “An overview of multi-task learning in deep neural networks.” Preprint, submitted May 29, 2007. http://arxiv.org/abs/1706.05098.
Saleh, K., M. Hossny, and S. Nahavandi. 2017. “Early intent prediction of vulnerable road users from visual attributes using multi-task learning network.” In Proc., IEEE Int. Conf. on Systems, Man, and Cybernetics, SMC 2017. New York: IEEE.
Shore, J. E., and R. M. Gray. 1982. “Minimum cross-entropy pattern classification and cluster analysis.” IEEE Trans. Pattern Anal. Mach. Intell. 1 (Jan): 11–17. https://doi.org/10.1109/TPAMI.1982.4767189.
Son, H., H. Choi, H. Seong, and C. Kim. 2019. “Detection of construction workers under varying poses and changing background in image sequences via very deep residual networks.” Autom. Constr. 99 (Mar): 27–38. https://doi.org/10.1016/j.autcon.2018.11.033.
Teizer, J., and T. Cheng. 2015. “Proximity hazard indicator for workers-on-foot near miss interactions with construction equipment and geo-referenced hazard areas.” Autom. Constr. 60 (Dec): 58–73. https://doi.org/10.1016/j.autcon.2015.09.003.
Xu, K., Y. Sun, and G. Liu. 2011. “A real-time hybrid method for head pose estimation.” ICTIS 2011: Multimodal approach to sustained transportation system development: Information, technology, implementation, 1993–2002. Reston, VA: ASCE.
Yan, Y., E. Ricci, R. Subramanian, G. Liu, O. Lanz, and N. Sebe. 2016. “A multi-task learning framework for head pose estimation under target motion.” IEEE Trans. Pattern Anal. Mach. Intell. 38 (6): 1070–1083. https://doi.org/10.1109/TPAMI.2015.2477843.
YouTube. 2019. “Hospital construction.” Accessed April 7, 2019. https://www.youtube.com/channel/UCEKwrM78pRv8WRcKvZNtE1w.
Zhang, Y., and Q. Yang. 2017. “A survey on multi-task learning.” Preprint, submitted July 25, 2017. http://arxiv.org/abs/1707.08114.
Zhu, Z., M. W. Park, C. Koch, M. Soltani, A. Hammad, and K. Davari. 2016. “Predicting movements of onsite workers and mobile equipment for enhancing construction site safety.” Autom. Constr. 68 (Aug): 95–101. https://doi.org/10.1016/j.autcon.2016.04.009.

Information & Authors

Information

Published In

Go to Journal of Construction Engineering and Management
Journal of Construction Engineering and Management
Volume 147Issue 7July 2021

History

Received: Aug 23, 2020
Accepted: Jan 11, 2021
Published online: Apr 29, 2021
Published in print: Jul 1, 2021
Discussion open until: Sep 29, 2021

Permissions

Request permissions for this article.

Authors

Affiliations

Assistant Professor, Dept. of Construction Science, Univ. of Texas at San Antonio, 501 W César E Chávez Blvd., San Antonio, TX 78207 (corresponding author). ORCID: https://orcid.org/0000-0001-6110-5293. Email: [email protected]
Ph.D. Student, Lyles School of Civil Engineering, Purdue Univ., 550 Stadium Mall Dr., West Lafayette, IN 47907. Email: [email protected]
Ph.D. Student, Lyles School of Civil Engineering, Purdue Univ., 550 Stadium Mall Dr., West Lafayette, IN 47907. Email: [email protected]
Shuai Li, Ph.D., A.M.ASCE [email protected]
Assistant Professor, Dept. of Civil and Environmental Engineering, Univ. of Tennessee, 851 Neyland Dr., Knoxville, TN 37996. Email: [email protected]
Hubo Cai, Ph.D., M.ASCE [email protected]
Professor, Lyles School of Civil Engineering, Purdue Univ., 550 Stadium Mall Dr., West Lafayette, IN 47907. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share