Chapter
Jan 25, 2024

Machine Learning at Work? The Issue of Data Quality When Developing New Insight in Occupational Accidents

Publication: Computing in Civil Engineering 2023

ABSTRACT

Occupational accidents are an urgent problem in construction. Machine learning (ML) methods for analyzing large amounts of data and the availability of accident report data have generated aspirations for novel learnings. Yet the quality of data in terms of input, inner availability, and output occurs as an issue in many ML development projects. This paper aims at investigating strategies to define, understand, and tackle poor data quality in a contracting company’s accident reports. A selective literature review within software system data quality and ML shows different foci on external or internal data. A set of records of occupational accidents are then analyzed. There are many missing entries on causality, as well as shallow descriptions, which hinder the discovery of new risks—possibly due to the data collection format and procedures. The low number of full entries calls for new repair strategies—both externally and internally.

Get full access to this article

View all available purchase options and get full access to this chapter.

REFERENCES

Bell, E., Bryman, A., and Harley, B. (2019). Business research methods (5th ed.). Oxford University Press, Oxford.
Breck, E., Polyzotis, N., Roy, S., Whang, S., and Zinkevich, M. (2019). Data Validation for Machine Learning. In: Proc. Machine Learning and Systems, 1, 334–347.
Cerda, P., and Varoquaux, G. (2020). Encoding high-cardinality string categorical variables. IEEE Transactions on Knowledge and Data Engineering, 34(3), 1164–1176.
Ding, Y., Ma, J., and Luo, X. (2022). Applications of natural language processing in construction. Automation in Construction, 136, 104169.
Dobbe, R., Dean, S., Gilbert, T., and Kohli, N. (2018). A broader view on bias in automated decision-making: Reflecting on epistemology and dynamics.
Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., and Tabona, O. (2021). A survey on missing data in machine learning. Journal of Big Data, 8(1), 1–37.
Flores, J., and Sun, J. (2018). Information quality awareness and information quality practice. Journal of Data and Information Quality, 10(1), 1–18.
Gudivada, V., Apon, A., and Ding, J. (2017). Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. International Journal on Advances in Software, 10(1), 1–20.
Hegde, J., and Rokseth, B. (2020). Applications of machine learning methods for engineering risk assessment–A review. Safety science, 122, 104492.
Khallaf, R., and Khallaf, M. (2021). Classification and analysis of deep learning applications in construction: A systematic literature review. Automation in construction, 129, 103760.
Lee, Y. W., Pipino, L. L., Funk, J. D., and Wang, R. Y. (2006). Journey to data quality. The MIT Press.
Liu, J., Luo, H., and Liu, H. (2022). Deep learning-based data analytics for safety in construction. Automation in Construction, 140, 104302.
Makaba, T., and Dogo, E. (2019). A comparison of strategies for missing values in data on machine learning classification algorithms. In: 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC). IEEE.
Maydanchik, A. (2007). Data quality assessment. Technics Publications, NJ.
Mellin, W. D. (1957). Work with new electronic ‘brains’ opens field for army math experts. The Hammond Times, 10, 66.
Radford, J., and Joseph, K. (2020). Theory in, theory out: the uses of social theory in machine learning for social science. Frontiers in big Data, 3, 18.
Shayboun, M. (2022). Toward Accident Prevention Through Machine Learning Analysis of Accident Reports. Licentiate thesis. Chalmers University of Technology, Gothenburg.
Vallmuur, K. (2015). Machine learning approaches to analysing textual injury surveillance data: a systematic review. Accident Analysis & Prevention, 79, 41–49.
Whang, S. E., and Lee, J. G. (2020). Data collection and quality challenges for deep learning. Proceedings of the VLDB Endowment, 13(12), 3429–3432.
Xu, Y., Zhou, Y., Sekula, P., and Ding, L. (2021). Machine learning in construction: From shallow to deep learning. Developments in the Built Environment, 100045.
Yan, H., Yang, N., Peng, Y., and Ren, Y. (2020). Data mining in the construction industry: Present status, opportunities, and future trends. Automation in Construction, 119, 103331.
Zhu, H., Madnick, S. E., Lee, Y. W., and Wang, R. Y. (2014). Data and Information Quality Research: Its Evolution and Future. In: Computer Handbook Set, 16. Taylor & Francis.

Information & Authors

Information

Published In

Go to Computing in Civil Engineering 2023
Computing in Civil Engineering 2023
Pages: 461 - 468

History

Published online: Jan 25, 2024

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

May Shayboun [email protected]
1Dept. of Construction and Energy Engineering, Halmstad Univ., Halmstad, Sweden. Email: [email protected]
Christian Koch [email protected]
2Dept. of Construction and Energy Engineering, Halmstad Univ., Halmstad, Sweden. ORCID: https://orcid.org/0000-0003-3750-976X. Email: [email protected]
Dimosthenis Kifokeris [email protected]
3Dept. of Architecture and Civil Engineering, Chalmers Univ. of Technology, Gothenburg, Sweden. ORCID: https://orcid.org/0000-0003-4186-8730. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Paper
$35.00
Add to cart
Buy E-book
$266.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Paper
$35.00
Add to cart
Buy E-book
$266.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share