Machine Learning at Work? The Issue of Data Quality When Developing New Insight in Occupational Accidents
Publication: Computing in Civil Engineering 2023
ABSTRACT
Occupational accidents are an urgent problem in construction. Machine learning (ML) methods for analyzing large amounts of data and the availability of accident report data have generated aspirations for novel learnings. Yet the quality of data in terms of input, inner availability, and output occurs as an issue in many ML development projects. This paper aims at investigating strategies to define, understand, and tackle poor data quality in a contracting company’s accident reports. A selective literature review within software system data quality and ML shows different foci on external or internal data. A set of records of occupational accidents are then analyzed. There are many missing entries on causality, as well as shallow descriptions, which hinder the discovery of new risks—possibly due to the data collection format and procedures. The low number of full entries calls for new repair strategies—both externally and internally.
Get full access to this article
View all available purchase options and get full access to this chapter.
REFERENCES
Bell, E., Bryman, A., and Harley, B. (2019). Business research methods (5th ed.). Oxford University Press, Oxford.
Breck, E., Polyzotis, N., Roy, S., Whang, S., and Zinkevich, M. (2019). Data Validation for Machine Learning. In: Proc. Machine Learning and Systems, 1, 334–347.
Cerda, P., and Varoquaux, G. (2020). Encoding high-cardinality string categorical variables. IEEE Transactions on Knowledge and Data Engineering, 34(3), 1164–1176.
Ding, Y., Ma, J., and Luo, X. (2022). Applications of natural language processing in construction. Automation in Construction, 136, 104169.
Dobbe, R., Dean, S., Gilbert, T., and Kohli, N. (2018). A broader view on bias in automated decision-making: Reflecting on epistemology and dynamics.
Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., and Tabona, O. (2021). A survey on missing data in machine learning. Journal of Big Data, 8(1), 1–37.
Flores, J., and Sun, J. (2018). Information quality awareness and information quality practice. Journal of Data and Information Quality, 10(1), 1–18.
Gudivada, V., Apon, A., and Ding, J. (2017). Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. International Journal on Advances in Software, 10(1), 1–20.
Hegde, J., and Rokseth, B. (2020). Applications of machine learning methods for engineering risk assessment–A review. Safety science, 122, 104492.
Khallaf, R., and Khallaf, M. (2021). Classification and analysis of deep learning applications in construction: A systematic literature review. Automation in construction, 129, 103760.
Lee, Y. W., Pipino, L. L., Funk, J. D., and Wang, R. Y. (2006). Journey to data quality. The MIT Press.
Liu, J., Luo, H., and Liu, H. (2022). Deep learning-based data analytics for safety in construction. Automation in Construction, 140, 104302.
Makaba, T., and Dogo, E. (2019). A comparison of strategies for missing values in data on machine learning classification algorithms. In: 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC). IEEE.
Maydanchik, A. (2007). Data quality assessment. Technics Publications, NJ.
Mellin, W. D. (1957). Work with new electronic ‘brains’ opens field for army math experts. The Hammond Times, 10, 66.
Radford, J., and Joseph, K. (2020). Theory in, theory out: the uses of social theory in machine learning for social science. Frontiers in big Data, 3, 18.
Shayboun, M. (2022). Toward Accident Prevention Through Machine Learning Analysis of Accident Reports. Licentiate thesis. Chalmers University of Technology, Gothenburg.
Vallmuur, K. (2015). Machine learning approaches to analysing textual injury surveillance data: a systematic review. Accident Analysis & Prevention, 79, 41–49.
Whang, S. E., and Lee, J. G. (2020). Data collection and quality challenges for deep learning. Proceedings of the VLDB Endowment, 13(12), 3429–3432.
Xu, Y., Zhou, Y., Sekula, P., and Ding, L. (2021). Machine learning in construction: From shallow to deep learning. Developments in the Built Environment, 100045.
Yan, H., Yang, N., Peng, Y., and Ren, Y. (2020). Data mining in the construction industry: Present status, opportunities, and future trends. Automation in Construction, 119, 103331.
Zhu, H., Madnick, S. E., Lee, Y. W., and Wang, R. Y. (2014). Data and Information Quality Research: Its Evolution and Future. In: Computer Handbook Set, 16. Taylor & Francis.
Information & Authors
Information
Published In
History
Published online: Jan 25, 2024
ASCE Technical Topics:
- Accidents
- Artificial intelligence and machine learning
- Bibliographies
- Business management
- Computer programming
- Computing in civil engineering
- Construction engineering
- Construction management
- Contracts and subcontracts
- Data analysis
- Engineering fundamentals
- Information management
- Methodology (by type)
- Occupational safety
- Practice and Profession
- Project management
- Public administration
- Public health and safety
- Research methods (by type)
- Safety
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.