Automated Bridge Inspection Image Interpretation Based on Vision-Language Pre-Training
Publication: Computing in Civil Engineering 2023
ABSTRACT
Bridge inspection images capture a wealth of information and details about bridge conditions. This study proposes a method to interpret on-site bridge inspection images by generating human-readable descriptive sentences. The resulting text can be formed into a bridge inspection report to aid and expedite the bridge inspection process for bridge engineers; and the extracted information can be further exploited to support bridge deterioration prediction and maintenance decision making. This is, however, a challenging task that combines computer vision and natural language processing. First, it not only requires object detection/segmentation from the bridge inspection images but also demands a grasp of the relationships between the recognized objects. Second, human-readable sentences need to be generated based on the extracted information from the images. Third, the available bridge image-text data pairs, which can be used for training, are quite limited and highly noisy. To address these gaps, this paper proposes a deep learning-based model for generating free-form human-readable descriptive sentences of the bridge conditions, which leverages bootstrapping language-image pre-training (BLIP) and its vision-language pre-training data from the web. This paper discusses the proposed model and its performance results.
Get full access to this article
View all available purchase options and get full access to this chapter.
REFERENCES
Al-Malla, M. A., Jafar, A., and Ghneim, N. (2022). “Image captioning model using attention and object features to mimic human image understanding.” Journal of Big Data, 9(1), 1–16.
Anderson, P., Fernando, B., Johnson, M., and Gould, S. (2016). “Spice: Semantic propositional image caption evaluation.” Proc., Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Part V 14, Springer.
Aneja, J., Deshpande, A., and Schwing, A. G. (2018). “Convolutional image captioning.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
Bianchi, E., Abbott, A. L., Tokekar, P., and Hebdon, M. (2021). “COCO-bridge: structural detail data set for bridge inspections.” J. Comput. Civil Eng., 35(3), 04021003.
Cao, R., El-Tawil, S., and Agrawal, A. K. (2020). “Miami pedestrian bridge collapse: Computational forensic analysis.” J. Bridge Eng., 25(1), 04019134.
Chun, P. J., Yamane, T., and Maemura, Y. (2022). “A deep learning‐based image captioning method to automatically generate comprehensive explanations of bridge damage.” Comput.-Aided Civ. Infrastruct. Eng., 37(11), 1387–1401.
Dosovitskiy, A., et al. (2020). “An image is worth 16x16 words: Transformers for image recognition at scale.”.
Gan, Z., Li, L., Li, C., Wang, L., Liu, Z., and Gao, J. (2022). “Vision-language pre-training: Basics, recent advances, and future trends.” Found. Trends Comput. Graph. Vis., 14(3-4), 163–352.
Herdade, S., Kappeler, A., Boakye, K., and Soares, J. (2019). “Image captioning: Transforming objects into words.” Adv. Neural Inf. Process Syst., 32.
Hüthwohl, P., Lu, R., and Brilakis, I. (2019). “Multi-classifier for reinforced concrete bridge defects.” Autom. Constr., 105, 102824.
Li, J., Li, D., Xiong, C., and Hoi, S. (2022). “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation.” Proc., International Conference on Machine Learning, PMLR.
Lin, C.-Y. (2004). “Rouge: A package for automatic evaluation of summaries.” Text summarization branches out.
Munawar, H. S., Hammad, A. W., Haddad, A., Soares, C. A. P., and Waller, S. T. (2021). “Image-based crack detection methods: A review.” Infrastructures, 6(8), 115.
Narazaki, Y., Hoskere, V., Hoang, T. A., and Spencer, B. F., Jr. (2018). “Automated bridge component recognition using video data.”.
Pan, Y., Yao, T., Li, Y., and Mei, T. (2020). “X-linear attention networks for image captioning.” Proc., IEEE/CVF Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). “Bleu: a method for automatic evaluation of machine translation.” Proc., 40th Annual Meeting of the Association for Computational Linguistics.
Salem, H., and Helmy, H. M. (2014). “Numerical investigation of collapse of the Minnesota I-35W bridge.” Eng. Struct., 59, 635–645.
Soskin, E. J. (2022). Challenges Facing DOT in Implementing the Infrastructure Investment and Jobs Act. https://www.oig.dot.gov/sites/default/files/OIG%20Correspondence%20-%20Challenges%20Facing%20DOT%20in%20Implementing%20IIJA.pdf.
Vedantam, R., Lawrence Zitnick, C., and Parikh, D. (2015). “Cider: Consensus-based image description evaluation.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. (2015). “Show and tell: A neural image caption generator.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
Wang, Q., and Chan, A. B. (2018). “Cnn+ cnn: Convolutional decoders for image captioning.”.
Wang, Z., Yu, J., Yu, A. W., Dai, Z., Tsvetkov, Y., and Cao, Y. (2021). “Simvlm: Simple visual language model pretraining with weak supervision.”.
Wu, Y., et al. (2016). “Google’s neural machine translation system: Bridging the gap between human and machine translation.”.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015). “Show, attend and tell: Neural image caption generation with visual attention.” Proc., International conference on machine learning, PMLR.
Yao, T., Pan, Y., Li, Y., Qiu, Z., and Mei, T. (2017). “Boosting image captioning with attributes.” Proc., IEEE Int. Conf. Comput. Vis.
You, Q., Jin, H., Wang, Z., Fang, C., and Luo, J. (2016). “Image captioning with semantic attention.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
Zhong, Y., Yang, J., Zhang, P., Li, C., Codella, N., Li, L. H., Zhou, L., Dai, X., Yuan, L., and Li, Y. (2022). “Regionclip: Region-based language-image pre-training.” Proc., IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
Zhu, J., Zhang, C., Qi, H., and Lu, Z. (2020). “Vision-based defects detection for bridges using transfer learning and convolutional neural networks.” Struct. Infrastruct. Eng., 16(7), 1037–1049.
Information & Authors
Information
Published In
History
Published online: Jan 25, 2024
ASCE Technical Topics:
- Architectural engineering
- Automation and robotics
- Bridge engineering
- Bridge management
- Bridges
- Building management
- Business management
- Computer vision and image processing
- Construction engineering
- Construction management
- Engineering fundamentals
- Human and behavioral factors
- Inspection
- Maintenance and operation
- Methodology (by type)
- Practice and Profession
- Structural engineering
- Systems engineering
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.