Chapter
Jan 25, 2024

Automated Bridge Inspection Image Interpretation Based on Vision-Language Pre-Training

Publication: Computing in Civil Engineering 2023

ABSTRACT

Bridge inspection images capture a wealth of information and details about bridge conditions. This study proposes a method to interpret on-site bridge inspection images by generating human-readable descriptive sentences. The resulting text can be formed into a bridge inspection report to aid and expedite the bridge inspection process for bridge engineers; and the extracted information can be further exploited to support bridge deterioration prediction and maintenance decision making. This is, however, a challenging task that combines computer vision and natural language processing. First, it not only requires object detection/segmentation from the bridge inspection images but also demands a grasp of the relationships between the recognized objects. Second, human-readable sentences need to be generated based on the extracted information from the images. Third, the available bridge image-text data pairs, which can be used for training, are quite limited and highly noisy. To address these gaps, this paper proposes a deep learning-based model for generating free-form human-readable descriptive sentences of the bridge conditions, which leverages bootstrapping language-image pre-training (BLIP) and its vision-language pre-training data from the web. This paper discusses the proposed model and its performance results.

Get full access to this article

View all available purchase options and get full access to this chapter.

REFERENCES

Al-Malla, M. A., Jafar, A., and Ghneim, N. (2022). “Image captioning model using attention and object features to mimic human image understanding.” Journal of Big Data, 9(1), 1–16.
Anderson, P., Fernando, B., Johnson, M., and Gould, S. (2016). “Spice: Semantic propositional image caption evaluation.” Proc., Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Part V 14, Springer.
Aneja, J., Deshpande, A., and Schwing, A. G. (2018). “Convolutional image captioning.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
Bianchi, E., Abbott, A. L., Tokekar, P., and Hebdon, M. (2021). “COCO-bridge: structural detail data set for bridge inspections.” J. Comput. Civil Eng., 35(3), 04021003.
Cao, R., El-Tawil, S., and Agrawal, A. K. (2020). “Miami pedestrian bridge collapse: Computational forensic analysis.” J. Bridge Eng., 25(1), 04019134.
Chun, P. J., Yamane, T., and Maemura, Y. (2022). “A deep learning‐based image captioning method to automatically generate comprehensive explanations of bridge damage.” Comput.-Aided Civ. Infrastruct. Eng., 37(11), 1387–1401.
Dosovitskiy, A., et al. (2020). “An image is worth 16x16 words: Transformers for image recognition at scale.”.
Gan, Z., Li, L., Li, C., Wang, L., Liu, Z., and Gao, J. (2022). “Vision-language pre-training: Basics, recent advances, and future trends.” Found. Trends Comput. Graph. Vis., 14(3-4), 163–352.
Herdade, S., Kappeler, A., Boakye, K., and Soares, J. (2019). “Image captioning: Transforming objects into words.” Adv. Neural Inf. Process Syst., 32.
Hüthwohl, P., Lu, R., and Brilakis, I. (2019). “Multi-classifier for reinforced concrete bridge defects.” Autom. Constr., 105, 102824.
Li, J., Li, D., Xiong, C., and Hoi, S. (2022). “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation.” Proc., International Conference on Machine Learning, PMLR.
Lin, C.-Y. (2004). “Rouge: A package for automatic evaluation of summaries.” Text summarization branches out.
Munawar, H. S., Hammad, A. W., Haddad, A., Soares, C. A. P., and Waller, S. T. (2021). “Image-based crack detection methods: A review.” Infrastructures, 6(8), 115.
Narazaki, Y., Hoskere, V., Hoang, T. A., and Spencer, B. F., Jr. (2018). “Automated bridge component recognition using video data.”.
Pan, Y., Yao, T., Li, Y., and Mei, T. (2020). “X-linear attention networks for image captioning.” Proc., IEEE/CVF Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). “Bleu: a method for automatic evaluation of machine translation.” Proc., 40th Annual Meeting of the Association for Computational Linguistics.
Salem, H., and Helmy, H. M. (2014). “Numerical investigation of collapse of the Minnesota I-35W bridge.” Eng. Struct., 59, 635–645.
Soskin, E. J. (2022). Challenges Facing DOT in Implementing the Infrastructure Investment and Jobs Act. https://www.oig.dot.gov/sites/default/files/OIG%20Correspondence%20-%20Challenges%20Facing%20DOT%20in%20Implementing%20IIJA.pdf.
Vedantam, R., Lawrence Zitnick, C., and Parikh, D. (2015). “Cider: Consensus-based image description evaluation.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. (2015). “Show and tell: A neural image caption generator.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
Wang, Q., and Chan, A. B. (2018). “Cnn+ cnn: Convolutional decoders for image captioning.”.
Wang, Z., Yu, J., Yu, A. W., Dai, Z., Tsvetkov, Y., and Cao, Y. (2021). “Simvlm: Simple visual language model pretraining with weak supervision.”.
Wu, Y., et al. (2016). “Google’s neural machine translation system: Bridging the gap between human and machine translation.”.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015). “Show, attend and tell: Neural image caption generation with visual attention.” Proc., International conference on machine learning, PMLR.
Yao, T., Pan, Y., Li, Y., Qiu, Z., and Mei, T. (2017). “Boosting image captioning with attributes.” Proc., IEEE Int. Conf. Comput. Vis.
You, Q., Jin, H., Wang, Z., Fang, C., and Luo, J. (2016). “Image captioning with semantic attention.” Proc., IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
Zhong, Y., Yang, J., Zhang, P., Li, C., Codella, N., Li, L. H., Zhou, L., Dai, X., Yuan, L., and Li, Y. (2022). “Regionclip: Region-based language-image pre-training.” Proc., IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
Zhu, J., Zhang, C., Qi, H., and Lu, Z. (2020). “Vision-based defects detection for bridges using transfer learning and convolutional neural networks.” Struct. Infrastruct. Eng., 16(7), 1037–1049.

Information & Authors

Information

Published In

Go to Computing in Civil Engineering 2023
Computing in Civil Engineering 2023
Pages: 1 - 8

History

Published online: Jan 25, 2024

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Shengyi Wang, S.M.ASCE [email protected]
1Ph.D. Student, Dept. of Civil and Environmental Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL. Email: [email protected]
Nora El-Gohary, A.M.ASCE [email protected]
2Associate Professor, Dept. of Civil and Environmental Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Paper
$35.00
Add to cart
Buy E-book
$198.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Paper
$35.00
Add to cart
Buy E-book
$198.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share