Chapter

Jan 25, 2024

Automated Bridge Inspection Image Interpretation Based on Vision-Language Pre-Training

Authors: Shengyi Wang, S.M.ASCE [email protected], and Nora El-Gohary, A.M.ASCE [email protected]Author Affiliations

Publication: Computing in Civil Engineering 2023

https://doi.org/10.1061/9780784485224.001

ABSTRACT

Bridge inspection images capture a wealth of information and details about bridge conditions. This study proposes a method to interpret on-site bridge inspection images by generating human-readable descriptive sentences. The resulting text can be formed into a bridge inspection report to aid and expedite the bridge inspection process for bridge engineers; and the extracted information can be further exploited to support bridge deterioration prediction and maintenance decision making. This is, however, a challenging task that combines computer vision and natural language processing. First, it not only requires object detection/segmentation from the bridge inspection images but also demands a grasp of the relationships between the recognized objects. Second, human-readable sentences need to be generated based on the extracted information from the images. Third, the available bridge image-text data pairs, which can be used for training, are quite limited and highly noisy. To address these gaps, this paper proposes a deep learning-based model for generating free-form human-readable descriptive sentences of the bridge conditions, which leverages bootstrapping language-image pre-training (BLIP) and its vision-language pre-training data from the web. This paper discusses the proposed model and its performance results.

Get full access to this article

View all available purchase options and get full access to this chapter.

REFERENCES

Al-Malla, M. A., Jafar, A., and Ghneim, N. (2022). “Image captioning model using attention and object features to mimic human image understanding.” Journal of Big Data, 9(1), 1–16.

Anderson, P., Fernando, B., Johnson, M., and Gould, S. (2016). “Spice: Semantic propositional image caption evaluation.” Proc., Computer Vision–ECCV 2016: 14^th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Part V 14, Springer.

Aneja, J., Deshpande, A., and Schwing, A. G. (2018). “Convolutional image captioning.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.

Bianchi, E., Abbott, A. L., Tokekar, P., and Hebdon, M. (2021). “COCO-bridge: structural detail data set for bridge inspections.” J. Comput. Civil Eng., 35(3), 04021003.

Cao, R., El-Tawil, S., and Agrawal, A. K. (2020). “Miami pedestrian bridge collapse: Computational forensic analysis.” J. Bridge Eng., 25(1), 04019134.

Chun, P. J., Yamane, T., and Maemura, Y. (2022). “A deep learning‐based image captioning method to automatically generate comprehensive explanations of bridge damage.” Comput.-Aided Civ. Infrastruct. Eng., 37(11), 1387–1401.

Dosovitskiy, A., et al. (2020). “An image is worth 16x16 words: Transformers for image recognition at scale.”.

Gan, Z., Li, L., Li, C., Wang, L., Liu, Z., and Gao, J. (2022). “Vision-language pre-training: Basics, recent advances, and future trends.” Found. Trends Comput. Graph. Vis., 14(3-4), 163–352.

Herdade, S., Kappeler, A., Boakye, K., and Soares, J. (2019). “Image captioning: Transforming objects into words.” Adv. Neural Inf. Process Syst., 32.

Hüthwohl, P., Lu, R., and Brilakis, I. (2019). “Multi-classifier for reinforced concrete bridge defects.” Autom. Constr., 105, 102824.

Li, J., Li, D., Xiong, C., and Hoi, S. (2022). “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation.” Proc., International Conference on Machine Learning, PMLR.

Lin, C.-Y. (2004). “Rouge: A package for automatic evaluation of summaries.” Text summarization branches out.

Munawar, H. S., Hammad, A. W., Haddad, A., Soares, C. A. P., and Waller, S. T. (2021). “Image-based crack detection methods: A review.” Infrastructures, 6(8), 115.

Narazaki, Y., Hoskere, V., Hoang, T. A., and Spencer, B. F., Jr. (2018). “Automated bridge component recognition using video data.”.

Pan, Y., Yao, T., Li, Y., and Mei, T. (2020). “X-linear attention networks for image captioning.” Proc., IEEE/CVF Comput. Soc. Conf. Comput. Vis. Pattern Recognit.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). “Bleu: a method for automatic evaluation of machine translation.” Proc., 40^th Annual Meeting of the Association for Computational Linguistics.

Salem, H., and Helmy, H. M. (2014). “Numerical investigation of collapse of the Minnesota I-35W bridge.” Eng. Struct., 59, 635–645.

Soskin, E. J. (2022). Challenges Facing DOT in Implementing the Infrastructure Investment and Jobs Act. https://www.oig.dot.gov/sites/default/files/OIG%20Correspondence%20-%20Challenges%20Facing%20DOT%20in%20Implementing%20IIJA.pdf.

Vedantam, R., Lawrence Zitnick, C., and Parikh, D. (2015). “Cider: Consensus-based image description evaluation.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.

Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. (2015). “Show and tell: A neural image caption generator.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.

Wang, Q., and Chan, A. B. (2018). “Cnn+ cnn: Convolutional decoders for image captioning.”.

Wang, Z., Yu, J., Yu, A. W., Dai, Z., Tsvetkov, Y., and Cao, Y. (2021). “Simvlm: Simple visual language model pretraining with weak supervision.”.

Wu, Y., et al. (2016). “Google’s neural machine translation system: Bridging the gap between human and machine translation.”.

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015). “Show, attend and tell: Neural image caption generation with visual attention.” Proc., International conference on machine learning, PMLR.

Yao, T., Pan, Y., Li, Y., Qiu, Z., and Mei, T. (2017). “Boosting image captioning with attributes.” Proc., IEEE Int. Conf. Comput. Vis.

You, Q., Jin, H., Wang, Z., Fang, C., and Luo, J. (2016). “Image captioning with semantic attention.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.

Zhong, Y., Yang, J., Zhang, P., Li, C., Codella, N., Li, L. H., Zhou, L., Dai, X., Yuan, L., and Li, Y. (2022). “Regionclip: Region-based language-image pre-training.” Proc., IEEE/CVF Conf. Comput. Vis. Pattern Recognit.

Zhu, J., Zhang, C., Qi, H., and Lu, Z. (2020). “Vision-based defects detection for bridges using transfer learning and convolutional neural networks.” Struct. Infrastruct. Eng., 16(7), 1037–1049.

Information & Authors

Information

Published In

Go to Computing in Civil Engineering 2023

Computing in Civil Engineering 2023

Pages: 1 - 8

History

Published online: Jan 25, 2024

Permissions

Request permissions for this article.

Request Permissions

ASCE Technical Topics:

Authors

Affiliations

Shengyi Wang, S.M.ASCE [email protected]

¹Ph.D. Student, Dept. of Civil and Environmental Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL. Email: [email protected]

View all articles by this author

Nora El-Gohary, A.M.ASCE [email protected]

²Associate Professor, Dept. of Civil and Environmental Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL. Email: [email protected]

View all articles by this author

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)

ASCE Members: Please log in to see member pricing

Purchase

Save for later

ASCE Library Card (5 downloads)

$105.00

ASCE Library Card (20 downloads)

$280.00

Buy Single Paper

$35.00

Buy E-book

$198.00

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)

ASCE Members: Please log in to see member pricing

Purchase

Save for later

ASCE Library Card (5 downloads)

$105.00

ASCE Library Card (20 downloads)

$280.00

Buy Single Paper

$35.00

Buy E-book

$198.00

Media

Figures

Other

Tables