Chapter

Jan 25, 2024

Deep Learning Automation Risk: Identifying Object Detection Failure Modes Using Slice-Based Evaluation

Author: Thomas Czerniawski, Ph.D. [email protected]Author Affiliations

Publication: Computing in Civil Engineering 2023

https://doi.org/10.1061/9780784485224.024

ABSTRACT

Machine learning model evaluations in academic literature tend to fixate on summary performance metrics. However, the performance of a model architecture depends heavily on the context it is evaluated in. Superficial summary performance metrics fail to encapsulate this context and provide readers with virtually no understanding or ability to predict performance in new contexts. Slicing is a type of fine-grained machine learning evaluation, where data is separated into subsets and the performance on each subset is evaluated. Here, we demonstrate slice-based evaluation on a computer vision task, excavator detection in 2D color images. Critical slices are identified using metadata augmentation and feature-space clustering. Slices are created based on features including lighting, excavator color, weather, distance from camera, occlusion, view perspective, number of excavators, presence of other equipment, and environment. Meaningful performance trends identified using slice-based evaluation provide readers with insight about the task’s inherent hardness and training dataset imbalance. Slice-based evaluation should become standard practice in reporting machine learning method results in the academic literature.

Get full access to this article

View all available purchase options and get full access to this chapter.

REFERENCES

Campello, R. J. G. B., Moulavi, D., and Sander, J. (2013). Density-based clustering based on hierarchical density estimates. Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, April 14-17, 2013, Proceedings, Part II 17, 160–172. https://doi.org/10.1007/978-3-642-37456-2_14.

Chung, Y., Kraska, T., Polyzotis, N., Tae, K. H., and Whang, S. E. (2019). Automated data slicing for model validation: A big data-ai integration approach. IEEE Transactions on Knowledge and Data Engineering, 32(12), 2284–2296. https://doi.org/10.1109/TKDE.2019.2916074.

Czerniawski, T., and Leite, F. (2020). Automated digital modeling of existing buildings: A review of visual object recognition methods. Automation in Construction, 113, 103131. https://doi.org/10.1016/j.autcon.2020.103131.

d’Eon, G., D’Eon, J., Wright, J. R., and Leyton-Brown, K. (2022). The spotlight: A general method for discovering systematic errors in deep learning models. 2022 ACM Conference on Fairness, Accountability, and Transparency, 1962–1981. https://doi.org/10.1145/3531146.3533240.

Eyuboglu, S., Varma, M., Saab, K., Delbrouck, J.-B., Lee-Messer, C., Dunnmon, J., Zou, J., and Ré, C. (2022). Domino: Discovering systematic errors with cross-modal embeddings. https://doi.org/10.48550/arXiv.2203.14960.

Franceschelli, G., and Musolesi, M. (2022). Copyright in generative deep learning. Data & Policy, 4, e17. https://doi.org/DOI: 10.1017/dap.2022.10.

Liao, T., Taori, R., Raji, I. D., and Schmidt, L. (2021). Are we learning yet? a meta review of evaluation failures across machine learning. Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). https://papers.nips.cc/paper/2021.

Norman, D. A. (2015). The human side of automation. Road Vehicle Automation 2, 73–79. https://doi.org/10.1007/978-3-319-19078-5_7.

Oakden-Rayner, L., Dunnmon, J., Carneiro, G., and Ré, C. (2020). Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. Proceedings of the ACM Conference on Health, Inference, and Learning, 151–159. https://doi.org/10.1145/3368555.3384468.

Padilla, R., Netto, S. L., and Da Silva, E. A. B. (2020). A survey on performance metrics for object-detection algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130.

Pan, Y., and Zhang, L. (2021). Roles of artificial intelligence in construction engineering and management: A critical review and future trends. Automation in Construction, 122, 103517. https://doi.org/10.1016/j.autcon.2020.103517.

Parasuraman, R., and Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2), 230–253. https://doi.org/10.1518/001872097778543886.

Rockwell, T. (1992). The Rickover Effect: How one man made a difference. iUniverse, Inc.

Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., and Aroyo, L. M. (2021). “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1–15. https://doi.org/10.1145/3411764.3445518.

Sohoni, N., Dunnmon, J., Angus, G., Gu, A., and Ré, C. (2020). No subclass left behind: Fine-grained robustness in coarse-grained classification problems. Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 33, 19339–19352. https://proceedings.neurips.cc/paper/2020.

Spencer, B. F., Jr., Hoskere, V., and Narazaki, Y. (2019). Advances in computer vision-based civil infrastructure inspection and monitoring. Engineering, 5(2), 199–222. https://doi.org/10.1016/j.eng.2018.11.030.

Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., and Schmidt, L. (2020). Measuring Robustness to Natural Distribution Shifts in Image Classification. https://doi.org/10.48550/arXiv.2007.00644.

Van der Maaten, L., and Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11).

Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y. M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. https://doi.org/10.48550/arXiv.2207.02696.

Weis, P. P., and Wiese, E. (2019). Using tools to help us think: Actual but also believed reliability modulates cognitive offloading. Human Factors, 61(2), 243–254. https://doi.org/10.1177/0018720818797553.

Zhang, X., Ono, J. P., Song, H., Gou, L., Ma, K.-L., and Ren, L. (2022). SliceTeller: A Data Slice-Driven Approach for Machine Learning Model Validation. IEEE Transactions on Visualization and Computer Graphics, 29(1), 842–852. https://doi.org/10.1109/TVCG.2022.3209465.

Information & Authors

Information

Published In

Go to Computing in Civil Engineering 2023

Computing in Civil Engineering 2023

Pages: 194 - 201

History

Published online: Jan 25, 2024

Permissions

Request permissions for this article.

Request Permissions

ASCE Technical Topics:

Authors

Affiliations

Thomas Czerniawski, Ph.D. [email protected]

¹Edifice Lab, School of Sustainable Engineering and the Built Environment, Arizona State Univ. ORCID: https://orcid.org/0000-0002-7310-6522. Email: [email protected]

View all articles by this author

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)

ASCE Members: Please log in to see member pricing

Purchase

Save for later

ASCE Library Card (5 downloads)

$105.00

ASCE Library Card (20 downloads)

$280.00

Buy Single Paper

$35.00

Buy E-book

$198.00

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)

ASCE Members: Please log in to see member pricing

Purchase

Save for later

ASCE Library Card (5 downloads)

$105.00

ASCE Library Card (20 downloads)

$280.00

Buy Single Paper

$35.00

Buy E-book

$198.00

Media

Figures

Other

Tables