Deep Learning Automation Risk: Identifying Object Detection Failure Modes Using Slice-Based Evaluation
Publication: Computing in Civil Engineering 2023
ABSTRACT
Machine learning model evaluations in academic literature tend to fixate on summary performance metrics. However, the performance of a model architecture depends heavily on the context it is evaluated in. Superficial summary performance metrics fail to encapsulate this context and provide readers with virtually no understanding or ability to predict performance in new contexts. Slicing is a type of fine-grained machine learning evaluation, where data is separated into subsets and the performance on each subset is evaluated. Here, we demonstrate slice-based evaluation on a computer vision task, excavator detection in 2D color images. Critical slices are identified using metadata augmentation and feature-space clustering. Slices are created based on features including lighting, excavator color, weather, distance from camera, occlusion, view perspective, number of excavators, presence of other equipment, and environment. Meaningful performance trends identified using slice-based evaluation provide readers with insight about the task’s inherent hardness and training dataset imbalance. Slice-based evaluation should become standard practice in reporting machine learning method results in the academic literature.
Get full access to this article
View all available purchase options and get full access to this chapter.
REFERENCES
Campello, R. J. G. B., Moulavi, D., and Sander, J. (2013). Density-based clustering based on hierarchical density estimates. Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, April 14-17, 2013, Proceedings, Part II 17, 160–172. https://doi.org/10.1007/978-3-642-37456-2_14.
Chung, Y., Kraska, T., Polyzotis, N., Tae, K. H., and Whang, S. E. (2019). Automated data slicing for model validation: A big data-ai integration approach. IEEE Transactions on Knowledge and Data Engineering, 32(12), 2284–2296. https://doi.org/10.1109/TKDE.2019.2916074.
Czerniawski, T., and Leite, F. (2020). Automated digital modeling of existing buildings: A review of visual object recognition methods. Automation in Construction, 113, 103131. https://doi.org/10.1016/j.autcon.2020.103131.
d’Eon, G., D’Eon, J., Wright, J. R., and Leyton-Brown, K. (2022). The spotlight: A general method for discovering systematic errors in deep learning models. 2022 ACM Conference on Fairness, Accountability, and Transparency, 1962–1981. https://doi.org/10.1145/3531146.3533240.
Eyuboglu, S., Varma, M., Saab, K., Delbrouck, J.-B., Lee-Messer, C., Dunnmon, J., Zou, J., and Ré, C. (2022). Domino: Discovering systematic errors with cross-modal embeddings. https://doi.org/10.48550/arXiv.2203.14960.
Franceschelli, G., and Musolesi, M. (2022). Copyright in generative deep learning. Data & Policy, 4, e17. https://doi.org/DOI: 10.1017/dap.2022.10.
Liao, T., Taori, R., Raji, I. D., and Schmidt, L. (2021). Are we learning yet? a meta review of evaluation failures across machine learning. Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). https://papers.nips.cc/paper/2021.
Norman, D. A. (2015). The human side of automation. Road Vehicle Automation 2, 73–79. https://doi.org/10.1007/978-3-319-19078-5_7.
Oakden-Rayner, L., Dunnmon, J., Carneiro, G., and Ré, C. (2020). Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. Proceedings of the ACM Conference on Health, Inference, and Learning, 151–159. https://doi.org/10.1145/3368555.3384468.
Padilla, R., Netto, S. L., and Da Silva, E. A. B. (2020). A survey on performance metrics for object-detection algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130.
Pan, Y., and Zhang, L. (2021). Roles of artificial intelligence in construction engineering and management: A critical review and future trends. Automation in Construction, 122, 103517. https://doi.org/10.1016/j.autcon.2020.103517.
Parasuraman, R., and Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2), 230–253. https://doi.org/10.1518/001872097778543886.
Rockwell, T. (1992). The Rickover Effect: How one man made a difference. iUniverse, Inc.
Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., and Aroyo, L. M. (2021). “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1–15. https://doi.org/10.1145/3411764.3445518.
Sohoni, N., Dunnmon, J., Angus, G., Gu, A., and Ré, C. (2020). No subclass left behind: Fine-grained robustness in coarse-grained classification problems. Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 33, 19339–19352. https://proceedings.neurips.cc/paper/2020.
Spencer, B. F., Jr., Hoskere, V., and Narazaki, Y. (2019). Advances in computer vision-based civil infrastructure inspection and monitoring. Engineering, 5(2), 199–222. https://doi.org/10.1016/j.eng.2018.11.030.
Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., and Schmidt, L. (2020). Measuring Robustness to Natural Distribution Shifts in Image Classification. https://doi.org/10.48550/arXiv.2007.00644.
Van der Maaten, L., and Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11).
Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y. M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. https://doi.org/10.48550/arXiv.2207.02696.
Weis, P. P., and Wiese, E. (2019). Using tools to help us think: Actual but also believed reliability modulates cognitive offloading. Human Factors, 61(2), 243–254. https://doi.org/10.1177/0018720818797553.
Zhang, X., Ono, J. P., Song, H., Gou, L., Ma, K.-L., and Ren, L. (2022). SliceTeller: A Data Slice-Driven Approach for Machine Learning Model Validation. IEEE Transactions on Visualization and Computer Graphics, 29(1), 842–852. https://doi.org/10.1109/TVCG.2022.3209465.
Information & Authors
Information
Published In
History
Published online: Jan 25, 2024
ASCE Technical Topics:
- Artificial intelligence and machine learning
- Automatic identification systems
- Bibliographies
- Colleges and universities
- Computer programming
- Computer vision and image processing
- Computing in civil engineering
- Construction engineering
- Construction methods
- Detection methods
- Education
- Engineering fundamentals
- Excavation
- Information management
- Measurement (by type)
- Methodology (by type)
- Metric systems
- Neural networks
- Practice and Profession
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.