Open access
Technical Papers
Feb 28, 2024

3D Dense Reconstruction for Structural Defect Quantification

Publication: ASCE OPEN: Multidisciplinary Journal of Civil Engineering
Volume 2

Abstract

Recent advancements in vision-based visual inspection enable the identification, localization, and quantification of damage on structures. However, existing damage quantification methods are limited to measuring one- or two-dimensional attributes such as length or area, which is insufficient for certain damage types such as spalling that require depth in addition to in-plane measurements, as outlined in inspection manuals. To address this limitation, we propose utilizing image-based dense 3D reconstruction to perform full 3D quantifications to assess damages for concrete structure inspections. The proposed method is applied to quantify spalling damage in 3D to compute volumetric loss and maximum depth of the damage in line with bridge inspection manuals. Our approach involves using a convolutional neural network-based interactive segmentation algorithm to accurately segment spalling boundaries from images. Structure-from-motion and multiview stereo algorithms are then applied to generate a detailed 3D point cloud reconstruction of the spalling using multiple images. From this point cloud, a 3D mesh representation of the spalling is created for precise quantification. To validate our proposed technique, we conducted laboratory and field experiments to capture images and interactively segment the damage. The results demonstrate the effectiveness and reliability of our approach for 3D damage quantification in structure inspections.

Introduction

Critical civil infrastructure faces significant challenges due to simultaneous aging and exposure to a variety of natural and anthropogenic stressors, leading to concerns regarding both local damage and global degradation. Therefore, regular evaluations of structures, particularly bridges, at planned intervals are necessary (Brownjohn et al. 1999; Vardanega et al. 2016). Regulatory agencies across North America have thus established inspection frequencies, as documented in various inspection manuals (Province of British Columbia 2018; ONMTO 2000; MIDOT 2022; MADOT 2015). In Canada, for example, the Ontario Structural Inspection Manual (OSIM) mandates biennial inspections within arm’s length for bridges, retaining walls, and culverts (ONMTO 2000; Transport Canada 2023). Visual inspection has traditionally served as the primary assessment method for civil infrastructure, influencing crucial repair and rehabilitation decisions. However, these inspection reports often suffer from inaccuracies, subjectivity, and qualitative nature, potentially compromising the safety and longevity of the infrastructure.
In recent years, there has been a shift toward adopting more intelligent technologies to aid in visual inspections, called vision-based inspections. Different sensors and sensing platforms (e.g., colored images, infrared images, LiDARs mounted on drones) have been used to collect high-quality data from structures. Artificial intelligence (AI) has been utilized to analyze optical data to identify damage and its rating. For example, drones equipped with a camera and LiDAR can gather visual data following a predesigned flight path, then AI-based damage detection and quantification algorithms are applied to these data to identify, localize, and quantify the damage. The damage information is visualized and overlaid in digitized infrastructure models (e.g., digital twin) and shared with inspectors to take future actions. Various technology and workflows involving vision to assist structural inspections have been proposed in recent years. Researchers have explored image-based condition indexing for bridge elements: they used red, green, and blue (RGB) image transformations to detect and measure crack length and width as well as spalling area (Adhikari et al. 2013). These measurements are then used to assign condition ratings based on inspection manuals. The You Only Look Once (YOLO) (Redmon et al. 2016) family of networks has also found its way into the civil inspection field and has undergone adaptations to facilitate real-time detection of bridge defects (Teng et al. 2022). To enable monitoring of structures and maintenance of bridges, researchers have developed control laws to enable navigation of an unmanned aerial vehicle (UAV) in unknown environments to keep the field of interest in the camera’s field of view. This helps stabilizing the UAV above a planar target (Metni and Hamel 2007). In parallel, other researchers have leveraged surface textures on concrete surfaces to train neural networks for the purpose of estimating image scale that depends on the standoff distance the image is taken from. The trained model can be utilized to estimate scales for all images that are captured at a structure with similar textures. This approach provides a means to derive real-world measurements from images that otherwise offer only pixel-level data, with applications in cases involving structural defects on concrete surfaces such as cracks (Park et al. 2021). In addition, a comprehensive survey of sensing technologies in structural health monitoring has been provided by Sony et al. (2019), while the state-of-the-art in vision-based inspection methodologies has been meticulously reviewed by Spencer et al. (2019).
A majority of the vision-based inspection methods incorporate computer vision techniques to streamline or automate visual inspection procedures. For example, using convolutional neural network architectures (CNN), images with defects are classified (Kim et al. 2019; Kim and Cho 2020; Atha and Jahanshahi 2018; Chen and Jahanshahi 2018; Yeum 2016; Xu et al. 2018), localized (Hoskere et al. 2017, 2018), and segmented for subsequent measurements (Zhang et al. 2017; He et al. 2017). A prebuilt 3D digital model has also been used for spatially documenting the extracted damage information (Xiong et al. 2013; Perez-Perez et al. 2016; Armeni et al. 2016; Golparvar-Fard et al. 2011a, b; Lu et al. 2019; Zhao et al. 2021). While most of these methods focus on detection, segmentation, and labeling, little work has been done to extract transferable and quantifiable information about common types of expected damages in concrete structures. Along these lines, the authors’ research group developed eXtended Reality-based Inspection and Visualization (XRIV) for labeling damaged regions with the help of user input. A CNN-based feature back-propagating refined scheme algorithm is implemented to interactively segment damage areas for area measurement through a virtual interactive interface (Al-Sabbag et al. 2020).
Even though current techniques are widely accessible and well established, they often miss out on critical information required for damage assessment. For example, according to the OSIM, in addition to defect classification and its area measurement, some damage types such as spalling or scaling require depth measurement for evaluating their severity; for example, spalling damage must be categorized into four different severity classes (light, medium, severe, or very severe) depending on the size of the damage. The depth of the material loss must be measured for this classification (ONMTO 2000). However, estimating or measuring the depth information from images is not a trivial problem. Inherently, images can provide only 2D-size information in pixel units assuming that the damage is located on a structure’s surface. For the size measurement, the areas in physical units can only be obtained after calculating a unit conversion from pixel to real-world units (e.g., mm, cm). Any perspective distortion in an image should be removed before making inplane quantitative measurements. However, since depth cannot be measured from a single image, in lieu of direct depth measurements, multiple images are necessary to model the damage in 3D first. The global poses of images with respect to a structure are computed using a 3D mapping algorithm such as simultaneous localization and mapping or structure-from-motion (SfM). Precalibrated stereo camera sensors also exploit two-view geometry to estimate disparity and depth. Popular stereo cameras such as ZED series from Stereolabs or Intel’s RealSense series are used as industry standards for performing depth estimation. Nevertheless, these sensors come with certain limitations. Stereo cameras are often not suitable for fine-grained segmentation, especially at a long working distance. For example, ZED cameras offer the best depth accuracy at a working distance of 1 m (Stereolabs 2023). On the other hand, the RealSense series from Intel offers a maximum range of up to 10 m, but the ideal range is within 3 m (Intel 2023). This highlights a gap in the range requirement typically achievable in scenarios involving civil infrastructure inspection. Other sensors, such as time-of-flight (ToF) infrared sensors, typically do not have sufficient resolution unless the data are collected in close range. For example, Microsoft’s Azure Kinect DK using a 1-MP ToF depth camera supports a resolution of 320 × 288 at 5.46 m (Microsoft 2023). Researchers (Beckman et al. 2019) have employed the Kinect V2 to gather point cloud data at close ranges, specifically within 2.5 m being limited by the challenges of reduced resolution and hardware issues, including lens distortion and vignetting. However, drawbacks of this approach include the restricted range, dependence on hardware tailored for specific actions, and diminished logistical flexibility due to limitations in physical access imposed by the aforementioned constraints. The current vision-based inspection techniques have indeed achieved streamlining the visual inspection process in terms of damage detection, localization, and documentation; however, without depth measurement, they cannot fully achieve the quantitative aspects of inspection requirements outlined in the inspection manuals.
This paper addresses the gap in making reliable damage quantification in 3D. The damage is reconstructed in 3D using SfM followed by multiview stereo (MVS), resulting in a dense and precise damage reconstruction. This allows precise 3D mesh reconstruction from a series of images. We present a novel method of utilizing this 3D mesh reconstruction to obtain the depth and volume of spalling damages. We also delve into point cloud postprocessing techniques for volumetric analysis and provide guidelines for effective parametric tuning. The proposed method is integrated into the previous XRIV workflow, encompassing damage segmentation and quantification, thereby facilitating an end-to-end damage inspection. The proposed procedure and technique in this paper were tested and validated through both laboratory and field experiments. Microsoft HoloLens 2 (HL2) was utilized to collect images with their relative poses to the target damage. For the lab testing, a mockup of spalling damage was fabricated using a 3D printer to enable precise performance evaluation of the proposed method. The measurement accuracy was comprehensively evaluated under different data collection settings including different distances or numbers of images. We also demonstrated the techniques for actual spalling damage in an inservice bridge. The key contributions of this work are as follows:
1.
Establishing a novel method integrating segmentation, reconstruction, and 3D quantification of damage utilizing CNN-based interactive segmentation, SfM, and MVS vision algorithms.
2.
Introducing an end-to-end methodology for accurately measuring the volume and depth of spalling damage through the reconstructed mesh acquired from the 3D point cloud.
3.
Offering recommendations for optimizing the various parameters involved in these processes to ensure precise measurements based on rigorous experimentation.

Proposed Approach

The objective of the proposed method is to evaluate the use of dense reconstructions to make reliable, repeatable, and robust damage measurements in 3D using images. The SfM and MVS algorithms are used to reconstruct damage in 3D for quantification including the volumetric loss or maximum depth measurement. Here, our method can be applied to quantify any damage type that requires 3D measurement for severity classification, but we developed the pipeline targeting for spalling damage evaluation in this study. A summary of the method is explained in Fig. 1:
Step 1. Image collection: images (Ii) are collected from a region of interest (spalling in this study) where i indicates the image index. Their relative poses (Pi) in a real-world scale are obtained from an image acquisition device (HL2 in this study). The number of images varies depending on the resolution of Ii and the distance from the damage.
Step 2. Sparse 3D point cloud reconstruction using SfM: performing SfM using images (Ii) and their relative pose information (Pi) for initialization generates a sparse 3D point cloud of the scene (X) in a real-world scale— SfM(Ii,Pi)X,P¯i. Then, the initial poses (Pi) are refined as 4¯, which is necessary for performing the next step.
Step 3. Dense 3D point cloud reconstruction using MVS: MVS is a process to densify the point cloud using images (Ii) and their poses (P¯i). A depth map is generated from a pair of images using the PatchMatch algorithm (Barnes et al. 2009; Shen 2013), and X from Step 2 is used for initializing the depth map. By merging multiple depth maps from pairs of images, a densified point cloud (D) is obtained: MVS(Ii,P¯i,X)D.
Step 4. Spalling segmentation and point cloud extraction: the boundary of spalling damage is segmented from one of the images (Isp), which captures the entire view of spalling damage and its pose information, P¯sp. Then, a subset of the dense point cloud (Dsp) that contains the spalling damage is extracted from D using P¯sp. In this paper, the interactive segmentation used in XRIV is implemented to extract an accurate spalling boundary through holographic interaction in HL2.
Step 5. Surface plane estimation: we define spalling damage as the space enclosed within the concave defect and a hypothetical flat plane that would have been present on the undamaged surface. A hypothetical surface plane (Ssp) that was originally present before the spalling occurred is estimated from the neighborhood point clouds around spalling damage (e.g., the point cloud in the exterior of the spalling in Step 5 of Fig. 1).
Step 6. Spalling damage reconstruction and quantification: Dsp is used to make a solid meshed surface (Msp) for geometry reconstruction. Then, using a geometric relationship between Ssp and Msp, volumetric loss and the maximum depth can be computed.
Fig. 1. Overview of the proposed workflow.

Step 1: Image Collection from Damage

Based on the quality of the input images, the point clouds generated from SfM and MVS vary in quality and accuracy. The results heavily depend on image resolution, the viewpoints from which these images are taken, and the number of images. Images used for modeling also need to be taken in sufficiently bright environments so that the visual features can be detected and tracked. Higher-resolution images produce denser point clouds for both SfM and MVS. High-quality, sharp (Eltner and Sofia 2020), consistent, overlapping, and sufficient number of images must be taken from different viewpoints ensuring that the target object is covered from a wide array of positions (Nyimbili et al. 2016).

Step 2: Sparse 3D Point Cloud Reconstruction Using SfM

The photogrammetry technique SfM reconstructs a scene in 3D from a series of multiple images of the same scene taken from different viewpoints while also obtaining the corresponding camera poses (Pi). It has been extensively used in the application of aerial inspections in the field of agriculture, geosciences, and environmental disaster management among other uses. It is also used in close-range 3D modeling such as cultural heritage preservation (Arapakopoulos et al. 2022; Nikolov and Madsen 2016), reconstruction of complex tumors (Campos et al. 2021), and artifacts in tourist attractions (Kadi and Anouche 2020) among other uses. Some state-of-the-art, commercially available SfM software packages are Agisoft PhotoScan Pro (Agisoft 2022), Bentley ContextCapture (Bentley Systems 2022), Autodesk ReCap Pro (Autodesk 2022), Pix4D (Pix4D 2022), 3Dflow 3DF Zephyr Pro (3Dflow 2022), and Reality Capture (RealityCapture 2022) and they have been studied in detail under various testing conditions (Nikolov and Madsen 2016). In addition, there are open-source packages available to perform SfM including OpenSfM (Mapillary 2017), COLMAP (Schonberger and Frahm 2016; Schönberger et al. 2016), OpenMVG (Moulon et al. 2016), and Bundler (Snavely 2013).
The underlying principle in SfM is that by capturing the same scene from multiple viewpoints, keypoints (common features across two or more images) are detected and used to estimate the position of such keypoints in 3D space. The process begins with initializing an image container that creates a data set with RGB images (Ii) and initial poses (Pi). Then, visual features are detected in all the images. These visual features or keypoints are usually traditional features used in computer vision such as scale-invariant feature transform (SIFT) (Lowe 2004), oriented FAST and rotated BRIEF (ORB) (Rublee et al. 2011), or accelerated-KAZE (AKAZE) (Alcantarilla et al. 2012, 2013). Once these keypoints are detected and matched in multiple viewpoint images, they can be identified in 3D space to populate a sparse point cloud in space using triangulation. Typically, after this initial estimate, the 3D point cloud as well as the poses are refined using a technique called bundle adjustment, which follows a nonlinear least squares minimization to refine the point cloud (X) and poses (P¯i) as the final output of SfM.
The reconstructed model and camera poses from SfM are up to scale unless users conduct a coordinate transformation to a real-world coordinate. For example, when users do not provide initial poses (Pi), the model is defined in an arbitrary coordinate system with an unknown scale. This presents problems in the inspection pipeline since the measurements made on the point clouds cannot be quantitatively evaluated. To tackle this issue, researchers have looked at various methods to obtain point clouds on a real-world scale. A common theme is to obtain a reference measurement using ground control points that can then be related to the physical size or scale. Global positioning system (GPS) data have also been used to perform reconstruction on a global scale in addition to improving the result by data fusion (Lhuillier 2011). In this project, we initialize the pose from the image collection platform. The XR headsets (e.g., HL2, Magic Leap 2) have their own simultaneous localization and mapping algorithm by fusing data from a depth sensor, accelerometer, and gyroscope sensor. These headsets compute the pose of the image collected that includes the locations (X, Y, and Z coordinates) in a real-world scale as well as the angle (in the form of quaternions). Thus, when these poses are provided as initial poses of the images, the point cloud reconstructed is in a real-world scale, which means any measurement from this point cloud is close to a true scale.
However, it must be noted that the point cloud resulting from SfM is sparse and needs to be further processed before making quantifications. The SfM detects sparse key points in images and triangulates them in 3D space. These keypoints, obtained from various visual features, are sparsely and nonuniformly distributed. The resulting point cloud only contains matched and filtered features, making it nondense. In our application, the SfM point cloud doesn’t cover the entire spalling damage region, thus requiring densification before meshing. Usually performing SfM followed by MVS is considered the gold standard for obtaining complete and dense reconstructions of scenes.

Step 3: Dense 3D Point Cloud Reconstruction Using MVS

The blanket term ‘MVS’ encompasses multiple techniques that utilize stereo correspondences between more than two images to estimate the 3D shape of the target captured in the images (Furukawa and Hernández 2013). One notable prerequisite of MVS is that it requires camera parameters to generate stereo correspondences on a set of unordered images. Since camera parameters from SfM are known, it can be interpreted from the epipolar geometry that one pixel in an image lies on a line (optic ray) in another image. This constraint turns the 2D matching problem into a 1D matching problem. As a result, MVS can generate dense 3D points in image regions devoid of visual features. A vast literature exists that delves deeper into determining pixel-wise matches. These metrics are called photo consistency measures. The study by Furukawa and Hernández (2013) explores various photo-consistency measures in detail with further specifics of improvements listed in Furukawa and Ponce (2010), Newcombe et al. (2011), and Scharstein et al. (2001). The MVS generates depth images that are then fused to obtain a dense point cloud. To count a point as an inlier in the fused 3D point cloud, a voting system is used where a certain number of minimum images must agree. This acts to further refine the point cloud by potentially cleaning the noise in 3D space.
The MVS complements the SfM pipeline for dense 3D scene reconstruction from images. While fundamentally different, MVS follows SfM in sequence to enhance the process. The SfM generates a sparse point cloud by triangulating matched feature points in 3D space across the image data set. Working on a 2D scale, SfM seeks feature points throughout the image space for matching. The outcome includes the 3D coordinates of these matched features, along with camera parameters. These parameters and images serve as inputs for MVS. Combining SfM with MVS is widely acknowledged to improve point cloud quality for small and medium-sized objects compared with alternative reconstruction methods such as terrestrial laser scanning (Skarlatos and Kiparissi 2012).
Among various implementations of MVS, we summarize the MVS process implemented in an open-source MVS library, OpenMVS (Cernea, unpublished manuscript, 2020). First, stereo image pairs (neighboring views) are identified using their refined poses. Then, a depth map is initialized with the sparse points associated with these pairs so that the depth and surface normal of the pixels on the image pairs are initialized. Then, the PatchMatch (Barnes et al. 2009; Shen 2013) algorithm is used to complete the depth map. The PatchMatch algorithm is used to compute the similarity between two square patches and find their correspondences. A patch centered around a testing pixel on a reference image is defined and transformed to the other paired image using homography computed from the given depth and plane normal of the local patch. Then, the similarity between the transformed patch and the patch in the paired image using the PatchMatch algorithm is checked. This process is repeated until finding the best correspondence and the corresponding depth of the patch becomes the depth of the testing pixel. Once the depth maps are completed, they are refined based on the neighboring views computed earlier. After refinement, the set of depth maps is fused to produce a high-resolution and densified point cloud (D). Compared with SfM, MVS computes the depth of every pixel on the image, producing a denser point cloud. However, accurate pose information of the images must be provided, which can be obtained from SfM with pose initialization in Step 2.

Step 4: Spalling Segmentation and Point Cloud Extraction

In this step, a binary mask that encompasses spalling damage is detected and segmented from an image. The binary mask can be understood as an operator that is applied to the original image. When overlaid on the operand (original image), the pixels and associated 3D points overlaid by black pixels of the mask are filtered out and removed whereas those overlaid by white pixels are segmented and saved. In the past, researchers have used CNN-based object instance class segmentation to detect objects in an image and generate segmentation masks for each instance of the object. For the applications of vision-based visual inspection, CNN-based pixel-wise predictions are also used for segmenting spalls, delamination (McLaughlin et al. 2020) as well as crack detection (Kim and Cho 2020). An alternative to pixel-wise segmentation is the bounding box detection method evolved from YOLO (Redmon et al. 2016), which researchers have used to detect surface cracks and rebars on bridges (Teng et al. 2022) as well as to detect signs of structural health deterioration such as discoloration, cracks, and spalling in cultural heritage sites (Mishra et al. 2022).
In this paper, we adopted a semiautomatic, interactive method using our previous work: XRIV, which is a CNN-based technique that utilizes user-provided seed points as initial conditions to improve the performance of segmentation (Al-Sabbag et al. 2020, 2022). The method is based on a pre-trained model that uses a ResNet-34 backbone that is pretrained on the ImageNet data set and uses a DeepLabV3 + decoder. The XRIV was originally developed to support XR headsets, enabling interaction with users. The technique is implemented in this work to segment or separate the target (spalling damage in this case) from the rest of the background (regions surrounding spalling). Segmentation using XRIV is presented as follows: the user first selects a few seed points inside the target (positive seed points) and a few seed points outside the target (negative seed points) as shown in Fig. 13(b). Using these points as initial conditions for the trained segmentation model, positive and negative distance maps are produced that encode the 2D spatial information about the location of the target in the image. These distance maps along with the image array are fed to the CNN as the input. Using a modified feature-backpropagating refinement scheme (f-BRS) (Sofiiuk et al. 2020), the network predicts a confidence map with values 0 to 1 based on the prediction of pixels that encompass the target region in the image array. The confidence map is then translated to a binary mask by thresholding the values of the confidence. By being a semiautomatic method, XRIV provides the flexibility of the user correcting automated segmentation by adding, replacing, or removing seed points; for example, if the method provides an erroneous or incomplete mask, the user can modify the seed points to achieve a more representative result. More details of XRIV are presented in Al-Sabbag et al. (2022).
Once we obtain the binary mask using XRIV, the 3D points associated with damage are detected as follows. First, the refined pose (P¯sp) is used to project the dense 3D point cloud (D) to the image Isp for which the mask has been generated. Then, the projected 2D points are checked to be located inside the mask. Valid points are then reprojected to 3D space to get the points lying within the target spalling (DSp). However, DSp may include nonspalling points that are located behind the spalling but lie in the line of sight from the camera thus still satisfying the projection; for example, a case where the spalling damage is located on a bridge pier, and the aggregate point cloud D contains parts of the bridge lying behind the pier. These points can be filtered by using a simple distance-based filter on DSp. This distance-based filter relies on the fact that the user has information about the standoff distance of the camera to the target region to some degree of accuracy. By filtering all points within that standoff distance (with some extra distance to allow for the depth of the target), Dsp can be cleaned and filtered to only include the points belonging to the target region. The standoff distance can be approximated by the seed point locations used in XRIV because 3D points corresponding to positive seed points are located at the spalling. Once the points of spalling are recognized in 3D space, they are used for reconstructing the damaged surface.

Step 5: Surface Plane Estimation

The study defines spalling damage as the missing material (e.g., concrete) detached from the surface. The volume of spalling is the space between the reconstructed spalling surface (Dsp) and the undamaged original surface. The maximum depth refers to the longest distance between these surfaces. Accurately estimating the original surface plane is crucial for quantifying spalling. To achieve this, a proposed process samples the nearby spalling region and fits a plane (Ssp) using the 3D points in that region. The method assumes that the damaged region was initially flat and planar, and extends these properties to the neighboring area. It acknowledges that errors in damage segmentation can result in incorrect plane reconstructions. To mitigate this, a simple yet effective sampling process is devised, which collects points outside the target mask while avoiding immediate proximity. This approach improves the robustness of the sampling method against inaccurate or inconsistent damage segmentation.
We use binary masks and image morphology techniques to extract the 3D points associated with the planar surface. In particular, we utilize the image dilation technique, denoted by ⊕. Mathematically, the dilation of a binary image A, by a kernel B, is defined as
AB=UbϵBAb
(1)
where ⋃ = union operator; and Ab = translation of A by b in both horizontal and vertical directions. By choosing the size of kernel B, the binary image, A, can be dilated, in simpler terms, expanded.
In this paper, we use two kernels, B1 and B2: B1 defines the outer offset distance of the mask, while B2 defines how far the mask is to be dilated. Using this method ensures that sampled points in the neighborhood lie on a plane while also avoiding sampling points in the immediate neighborhood of the damage, which may not be planar due to inaccuracies in the detection of the damage segmentation mask as well as the physical nature of infrastructure defects. The size of B1 depends on the confidence of obtaining accurate segmentation masks and should be decided in proportion to the size of the mask with respect to the image resolution. Higher accuracy masks will require lower padding to account for errors and hence a smaller B1 while masks with lower accuracy will need a higher value of B1 to avoid the segmentation error. Here, B2’s dependence is twofold, taking into account both the size of the segmentation mask and the expected quality of the point cloud. On the one hand, it is necessary for the segmentation mask to encompass enough points to accurately fit a plane, which is influenced by the quality and density of the point cloud. On the other hand, B2 must also adhere to the physical constraints of the object being processed. For instance, in the case of a slender column, B2 cannot extend excessively beyond its boundaries, causing the risk of including the points in the background.
Let M be the binary mask obtained from the images used for damage segmentation as explained before, M′ be the binary mask containing points outside the damage, and B1 and B2 be the kernels which are essentially all-ones matrices of sizes n × n where n is the kernel size. The following shows how M′ is computed:
M=(MB2)(MB1)
(2)
Fig. 2(a) is a schematic representation of the mask and operations performed on it to obtain M′, described in Eq. (2). Sample masks, M and M′, are shown in Figs. 2(b and c), which are derived from the lab testing explained in the laboratory experiment section. Once M′ is obtained, projective geometry is used to identify these points in 3D from the 2D mask images using the same methodology of extracting Dsp, explained previously. After obtaining the 3D points which lie on a plane, the plane equation can be found by fitting a surface through the points selected in the exterior neighborhood using RANSAC (Fischler and Bolles 1981), which is a random sampling-based method to estimate the model (in our case, a plane) from the data that contains outliers.
Fig. 2. Original surface plane estimation: (a) schematic diagrams showing operations to obtain M′ described in Eq. (2); (b) sample binary mask for spalling; and (c) neighborhood region where the point clouds are extracted for plane estimation.

Step 6: 3D Spalling Reconstruction and Quantification

In the last step, a 3D geometric mesh model of spalling damage is created from its dense reconstruction segmented from the point cloud using image mask M and projection techniques, Dsp and the volume and maximum depth are computed using this model. In general, the transition from an unstructured 3D point cloud to the mesh model is not a trivial problem and requires preprocessing of the point cloud before generating the mesh, which includes voxel downsampling and surface normal estimation (Zhou et al. 2018).
The preprocessing begins with voxel down sampling, which uses a voxel grid to sample a given unstructured point cloud uniformly. Having a uniformly sampled point cloud is necessary for generating a quality mesh model since this preserves the shape of the original surface. However, we found that the voxel size does not vary the uniformity of the point cloud in the presented method because the MVS process produces a uniform and dense point cloud from stereo correspondences. Next, point normals for all points are estimated to reconstruct accurate meshes. In this study, we estimate the normals by the placement of neighboring vertices for each vertex in the point cloud. Since damage shapes can vary and do not fit primitive smooth shapes; the algorithm must consider smaller neighborhoods to compute the point normal by accounting for the sharp, uneven and nonsmooth characteristics commonly found in typical defect appearance.
Once we have uniform and oriented point clouds, triangular meshes are to be created to transform points to surface geometry. There are two popular methods of meshing a surface from point clouds: ball pivoting (Bernardini et al. 2000) and Poisson surface reconstruction (Kazhdan et al. 2006). The ball-pivoting algorithm (BPA) can be summarized as follows: three points in a point cloud form a triangular mesh surface if a hypothetical ball touches them (or gets “caught” in them) at the same time. As the name suggests, this concept involves visualizing a sphere with a specific radius, r, as it pivots on each point of a given point cloud. At locations where the 3D ball touches three points, they form the vertices of a triangle surface in the mesh. When all the points have been accounted for in a local region, the algorithm looks for new seed points to commence the sequential meshing again. On the contrary, at locations where the ball falls through the point cloud, a surface is not reconstructed there, and the mesh becomes incomplete, like a void. This is done to account for regions in the point cloud that are not sufficiently dense and consequently may not be deserving candidates to produce a potential mesh surface. This condition can be bypassed by increasing the r value. By doing so, the ball is more likely to make contact with three surfaces simultaneously. However, this modification comes with a tradeoff: if the ball size is too big for a point cloud of a particular density, some of the points may not be reached by the ball and it might fail to encode all the details of the point cloud. To that effect, meshes representing spalling should be devoid of holes while still embedding the details necessary for accurate quantifications; hence, BPA is not suitable for our application.
The second alternative is Poisson surface reconstruction, which produces meshes in which points of the point clouds are the vertices of the triangle mesh without modifications to the points, pi, in the point cloud. The underlying principle of this algorithm relies on reconstructing an implicit function, f, whose value at pi is zero and the gradient at these points is the surface normal vector, ni that was computed previously (Kazhdan et al. 2006). As a result, this method produces watertight meshes having continuous surfaces. This is one of the primary reasons for using the algorithm for the proposed method. Since the algorithm constructs the surface based on an implicit function, it may produce ghost surfaces outside of the bounds of the oriented point cloud to satisfy the boundary conditions of the function. However, this drawback can be resolved by thresholding the mesh density since it would be sparser in surfaces reconstructed outside of the bounding box of the point cloud.
An important parameter to be configured for the Poisson reconstruction algorithm is the maximum depth, d of octree, used for the reconstruction. An octree is a data structure used to describe space and, in this context, store 3D reconstructed surfaces. An octree of depth d produces a mesh of dimensions 2d × 2d × 2d, thus higher octree depth implies more memory consumption, while also indicating a higher resolution of the mesh (Maiti and Chakravarty 2016). There is a tradeoff between the granularity of the mesh and computation time where both increase exponentially with a higher d value (Maiti and Chakravarty 2016).
The volume and the maximum depth of spalling can be computed from the 3D mesh model and surface plane location. A triangle mesh on the spalling surface is composed of three vertices and edges as shown in Fig. 3. The volume of the spalling is the summation of the volumes of triangle prisms entrapped between the damage and the hypothetical flat plane. Fig. 4 shows the diagram of the triangle prisms. Here, Ai, Bi, and Ci indicate vertices of a triangle surface i, and Ai,Bi, and Ci are the points on the plane that are obtained by projecting Ai, Bi, and Ci on the plane, respectively. The volume of the triangle prism can be computed by multiplying the area of the bottom plane and the average distance of the pair of the points, h¯. The volume of spalling (Vsp) is computed as
Vsp=i=0kΔAiBiCi×h¯iwhereh¯i=(hiA+hiB+hiC)/3
(3)
where ΔAiBiCi = area of a triangle having vertices Ai, Bi, and Ci; i = surface index; and k = total number of surfaces in the mesh. Similarly, the maximum depth, (MDsp), can be computed by selecting the maximum distance between the vertex and the corresponding point on a plane as
MDsp=maxik,jA,B,C{hij}
(4)
Fig. 3. Triangular mesh of spalling using Poisson surface reconstruction.
Fig. 4. Damage quantification from a reconstructed mesh.
As a sanity check, we also evaluate the 99.9th percentile depth (D99.9), which denotes the depth value below which 99.9% of all depths would lie, this is done to ensure that the MDsp found from Eq. (4) is not contaminated due to an outlier present in the mesh. Mathematically, it is calculated as the nth depth value among all the depths arranged in increasing order of their values, where n is called the ordinal rank. Once n is calculated, D99.9 is worked out by picking the depth value present at the nth location in the sorted list.

Experimental Validation

Laboratory Experiment

The proposed technique has been evaluated in a set of laboratory experiments to test its ability to accurately measure the volume and maximum depth of spalling damage. To perform accurate performance tests, an artificial spalling damage mockup was fabricated. First, as shown in Fig. 5(b), spalling damage was modeled in 3D CAD software using a close-range depth scan from real-world spalling damage on a bridge abutment in Fig. 5(a). It is to be noted that the depth scan was only used as a reference for modeling the spalling in the CAD environment [Fig. 5(b)], resulting in some dissimilarity between Figs. 5(a and b). The ground truth volume and the maximum depth were analytically computed in the 3D modeling software. Then, the mockup was created using a 3D printer in Fig. 5(c) using a gray-colored spool. Concrete texture spray was used to simulate spalling damage on the concrete surface. Note that concrete texture is necessary to create unique visual features that are used for feature detection and matching in SfM.
Fig. 5. Fabrication of spalling damage mockup for lab experiment: (a) actual spalling damage; (b) 3D model of the spalling of (a); and (c) 3D printed model with concrete texture based on the scanned model in (b).
In this study, we used HL2, which is a mixed reality (MR) device from Microsoft, shown in Fig. 6(a). HL2 allows users to collect images and videos and provides their pose information. In addition, it can track hand gestures to take photos and make spatial annotations, which is an important function for XRIV implementation, illustrated in Fig. 6(b). The HL2 features an 8 MP camera capable of capturing still images at a 4 k resolution of 3,904 × 2,196 pixels. In the laboratory experiments, 50 images (Ii) and their poses (Pi) were collected using HL2, with all images pointed toward the damage directly. Captured images and corresponding poses can be downloaded from HL2. The images were used at their full resolution without resizing to record details necessary for accurate 3D reconstruction. The images differed in their viewpoints both in terms of horizontal and vertical displacement and as in terms of rotation. Images were captured while moving in a semicircular arc around the damage. The experiment was repeated by taking images at rough standoff distances of 1, 2, 3, 4, and 5 m. A sample image set at 1 m is presented in Fig. 7.
Fig. 6. Image collection: (a) HL2; and (b) capturing images with a hand gesture.
Fig. 7. Images collected from HL2 at 1 m.
An open-source SfM package, OpenMVG (Moulon et al. 2016) is used for the sparse 3D reconstruction of the scene. OpenMVG was selected due to its user-friendly customization options for various reconstruction parameters and the ease with which data parsers can be implemented to facilitate seamless processing. This allowed the authors to write data parsers to process the poses from HL2 to be passed to OpenMVG for refinement. SIFT features and ULTRA preset were used for extracting more dense visual features. Once 2D SIFT features are extracted from all the images, they are matched with a nearest neighbor distance ratio of 0.6 as opposed to the default value of 0.8 that provides a smaller number of false positive matches. The initial pose of Ii is set to Pi from HL2. We used an incremental SfM process that incrementally adds new scenes to the initial pair and keeps updating the poses of the image using triangulation of matched feature points, followed by bundle adjustment. As a result, we obtained a sparse point cloud (X) and refined camera poses (P¯i).
OpenMVS is used to reconstruct the dense 3D point cloud. OpenMVS initializes depth maps by considering neighboring views for every image in the data set. The software package allows users to customize how many neighboring views are to be used to estimate the initial depth map. In this study, four neighboring views were used. These depth maps are then completed to fill in the gaps using the PatchMatch algorithm. Subsequently, during the fusion step, an important parameter to take into account is the minimum number of images required to consider an estimate as an inlier. In this particular experiment, three images were utilized, as this enabled the generation of maps with a sufficient density for Poisson surface reconstruction. The fusion of completed depth maps produces a fused dense colorized 3D point cloud. In the lab experiment, we did not perform damage segmentation using XRIV because it was demonstrated in its performance and applicability in previous research through lab and field experiments (Al-Sabbag et al. 2020, 2022). Instead, we manually annotated accurate boundaries of spalling for one image (per standoff distance) containing a clear view of the damage with least perspective distortion. This allowed us to evaluate the proposed methodology without being influenced by the performance of XRIV. It is to be noted that the markers visible in the top right corner of the spalling in some images were not used in this experiment.
Next, a surface plane was estimated using the points from the outer neighborhood of the damage. In this experiment, we determined the size of kernel B2 depending on the size of the spalling mask in an image, represented in pixels. The size of B1 was chosen to be 0.3 times the height or width of the bounding box enclosing the spalling mask (taking the smaller value). The size of B2 was chosen to be four times as much as B1. These parameters are chosen to sample adequate points outside the spalling to obtain the plane while also avoiding the point on the spalling boundary due to segmentation errors. Clearly, these values will vary depending on the size of the mask that depends on the viewpoint as well as the standoff distance and also the confidence in the accuracy of the binary mask detection. Once M′ is obtained using Eq. (2), the dense 3D points lying on the plane are obtained. RANSAC in Open3D (Zhou et al. 2018) is applied to the 3D points and computes the best-fit plane. A distance threshold of 0.1 mm was used to fit a plane through sampled 3D points with 104 iterations.
Poisson 3D reconstruction in Open3D performs surface reconstruction for spalling. We used a voxel size of 1 mm to downsample the point cloud followed by reducing the number of vertices in the mesh that can help preserve the original shape of the spalling. To improve the representation of irregularities typically observed in infrastructure defects, we reduced the nearest neighbor search parameter used to estimate surface normal from the default value of 30 to 8 because a higher number of neighbors lead to too much smoothing of the surface profile. As the Poisson reconstruction is defined by an implicit function (outlined in earlier sections), the method produces a surface that extends beyond the spalling boundary as the result of the continuity nature of the function. This extended surface (ghost surface) needs to be filtered out before computing the volumetric measurements. We implement a mesh density-based filtering wherein the vertices of the triangular mesh that have a low density, that is, are sparsely distributed, are filtered out. In this experiment, based on the density of the source dense point cloud, the density threshold was set to be 0.2, which ensured that the additional surfaces introduced by Poisson reconstruction are trimmed before the measurements are made on the mesh.
Fig. 8(a) shows the mesh produced from Poisson reconstruction. The surface extends beyond the spalling point cloud since it performs the reconstruction using the implicit function. However, this ghost surface needs to be trimmed to obtain an accurate representation of the spalling. In Fig. 8(b), the same original mesh is shown with a color map encoding the mesh density. Lighter color indicates higher density regions and darker indicates sparse regions in the mesh. With this mesh density map, the ghost surface is filtered out, which means the less dense triangle surfaces, seen in darker colors in Fig. 8(b) are trimmed. As a result, the final surface model is obtained in Fig. 8(c) after being processed through a density-based filter. The depth of the octree chosen was 8, which is a default value in Open3D, since any values higher than that led to unnecessary artifacts in the mesh with increased computation time and lower values had reduced details.
Fig. 8. Density-based filtering for ghost surfaces: (a) original mesh from Poisson reconstruction with less dense ghost surfaces; (b) density map of the original surface; and (c) mesh from Poisson reconstruction after filtering out the less dense parts of the original mesh.
To demonstrate the importance of parameter tuning and underscore the negative consequences associated with the selection of inappropriate parameters, we present in Fig. 9 a comparative analysis between meshes generated using default parameters and meshes created following systematic parameter tuning. In the first row of Figs. 9(a–c), we demonstrate the detrimental effects of using excessively low (4) or high (default value of 30) values for the number of nearest neighbors to estimate point normals that affect the surface profile. Excessive low values result in numerous artifacts, while excessively high values yield an overly smooth profile [shown in Figs. 9(a and c), respectively]. The artifacts are encircled in the figure. In contrast, the tuned parameter that uses eight nearest neighbors for normal estimation significantly reduces artifacts, as depicted in Fig. 9(b). Moving to the second row in Figs. 9(d–f), white mesh triangles are superimposed on the profile to illustrate the impact of octree depth. Reducing the octree depth to 4 [Fig. 9(d)] from the default value of 8 [Fig. 9(e)] leads to a decreased level of detail. Conversely, increasing the depth enhances the level of detail [Fig. 9(f)], but this also results in a longer computation time. For example, transitioning from an octree depth of 8 to 10 led to a 31-fold increase in computation time when executed on a local machine. In the last row in Figs. 9(g and h), we illustrate the trimming of ghost surfaces based on density-based filtering. Fig. 9(g) shows the mesh without the density-based filter, while Fig. 9(h) displays the mesh after trimming, following the filtering technique illustrated in Fig. 8.
Fig. 9. Significance of proper parameter tuning: (a–c) show the effect of the number of nearest neighbors for normal estimation; (d–f) present the effect of octree depth; and (g and h) show the effect of density-based filtering.
Finally, the accuracies of spalling volume and maximum depth estimation are evaluated using a percent error (PE) defined as
PEvol=100×(VtrueVmeas)/Vtrue
(5)
PEMD=100×(MDtrueMDmeas)/MDtrue
(6)
where Vtrue and Vmeas = ground truth (from CAD model) and measured values (from our method) of volume respectively; and MDtrue and MDmeas = ground truth and measured values of the maximum depth of damage. PEvol and PEMD are computed in each test distance. In this study, we performed plane estimation 10 times and calculated the average volume and maximum depth values. As the plane estimation is based on the RANSAC algorithm, the estimated plane differs with each estimation attempt, and these differences are more prominent when there are more errors in the surface points. Figs. 10(a and b) show the level of variation in the point cloud on the estimated surface, reconstructed from the data collected at standoff distances of 3 and 5 m, respectively. As standoff distance increases, small errors in the image poses magnifies to larger errors in point cloud estimation.
Fig. 10. Deviation of the points from the estimated plane. The neighborhood point clouds in (a and b) were utilized for plane estimation, and the corresponding models were reconstructed from images captured at distances of 3 and 5 m, respectively.
Fig. 11 shows the deviation of all neighborhood points from the estimated plane quantified as a normalized histogram. The vertical axis shows the fraction of points in the log scale and the horizontal axis denotes the deviation of the points in millimeters from the plane on a linear scale. The bin width was chosen as 0.5 mm to represent the data visually. Ideally, since the points are collected from the flat region, all points are placed near 0. However, owing to reconstruction errors, points deviate from the estimated surface. By comparing the distribution of the point locations from 3 and 5 m, it is obvious that as the data collection distance is longer, the deviation of the points from the plane is larger. Thus, PE values are averaged over 10 different RANSAC iterations and their standard deviations are also computed. These are shown as PE¯vol and PE¯MD in Table 1.
Fig. 11. Normalized histogram of point deviation from the estimated plane.
Table 1. Results for PE¯vol, PE¯MD, and PE¯D99.9 at varying standoff distances
Standoff distance (m)Average PEvol(%) (PE¯vol)Average PEMD(%) (PE¯MD)Average PED99.9(%) (PE¯D99.9)
1−5.922 ± 0.1956.045 ± 0.0417.629 ± 0.073
2−0.792 ± 0.0926.008 ± 0.0297.108 ± 0.036
3−1.128 ± 0.1385.143 ± 0.0446.938 ± 0.035
44.620 ± 0.7117.337 ± 0.2168.977 ± 0.176
510.050 ± 1.34712.863 ± 0.50912.740 ± 1.317
As a means of conducting a sanity check and investigating whether the measured maximum depth (MDmeas) can be attributed to an outlier introduced during the reconstruction process, we also compute the 99.9th percentile depth denoted as D99.9. It represents the value below which 99.9% of all other depths in the data set lie, meaning that only 0.1% of depths surpass this value. Using D99.9, conclusions can be drawn about the presence of outliers. Specifically, if MDmeasD99.9, it indicates that the depth data set may contain outliers that distort the distribution, suggesting that MDmeas might be associated with an outlier. In the absence of outliers, MDmeas should exhibit a similar or slightly higher value compared with D99.9. By extension, the percentage error of D99.9 from MDtrue, represented as PED99.9, should also be comparable with§ PEMD while being systematically higher. We evaluate PED99.9 as per Eq. (7) and use it to validate MDmeas as mentioned later in this section. This value is also averaged across the RANSAC iterations and presented as PE¯D99.9 in Table 1 as
PED99.9=100×(MDtrueD99.9)/MDtrue
(7)
In Table 1, in both PE¯vol and PE¯MD, the standard deviation among the results becomes more prominent at farther distances. This can be attributed to the fact that the quality of the damage model generated through SfM and MVS can be negatively impacted by the standoff distance. This is because the longer distance between the camera and the damage being imaged can lead to lower image resolution of the damaged regions, resulting in a reduction in the number of points clouds generated. As a result, the quality of the damage model may be degraded. In addition, another source of the error is attributed to the plane estimation. Furthermore, in Table 1, at close distances, signed PE¯vol is negative, indicating an overestimation of the measured volume. Owing to the nature of Poisson Reconstruction where it produces a continuous watertight surface mesh, the algorithm creates additional ghost surfaces and irregularities in those regions of the mesh that are left unfiltered by the density-based filtering which leads to overcounting of the measured volume. These results can be seen in close ranges at 1- and 2-meter distance where the method reported negative PE¯vol due to the extra surfaces reconstructed. As the distance between the camera and the spalling decreases, the coverage area decreases, even if the image resolution is higher. To mitigate this, when capturing images of complex spalling shapes at close range, users are advised to collect a sufficient number of images from various viewpoints to ensure comprehensive coverage of all areas of the spalling. In other cases, there is a generally increasing trend in the PE¯vol, PE¯MD, and PE¯D99.9 since physical regions of the ROI are captured in a decreasing resolution the further they are placed from the data-capturing device.
Considering that SfM and MVS results are typically affected by the exposure present in the images, this effect was studied by augmenting the brightness levels by ±20% in our data set. The results for PE¯vol and PE¯MD are presented in Table 2. The deliberate reduction of image brightness made the image space devoid of visual keypoints that are instrumental for reconstruction. The visually degraded environment rendered the environment less conducive to generating precise reconstructions, resulting in increased magnitudes of PE¯vol as well as PE¯MD. This effect was especially pronounced at close range, where visual occlusion, coupled with reduced visibility, contributed to notably elevated percentage error magnitudes. Conversely, elevating image brightness improved exposure up to a certain threshold. Beyond this point, however, the artificial augmentation could not introduce new keypoints that were not originally present in the images, as certain objects or features in the physical scene were inherently insufficiently bright to be captured. Consequently, the incremental improvement in results was observed, particularly while reporting PE¯MD. It is essential to acknowledge that the determination of PE¯vol is subject to the influence of numerous factors, and establishing a straightforward one-to-one direct comparison may not be scientifically robust.
Table 2. Results for PE¯vol and PE¯MD at varying standoff distances with brightness augmentation
Standoff distance (m) PE¯vol −20% brightness PE¯vol + 20% brightness PE¯MD −20% brightness PE¯MD + 20% brightness
112.330 ± 0.8281.508 ± 0.666.945 ± 0.3484.310 ± 0.220
2−2.849 ± 0.363−2.672 ± 0.3084.131 ± 0.0994.907 ± 0.140
3−13.347 ± 0.521−11.815 ± 0.3411.935 ± 0.1443.212 ± 0.0914
4−5.855 ± 1.619−7.826 ± 1.41511.588 ± 0.4737.802 ± 0.469
5−20.802 ± 4.783−12.128 ± 4.3614.268 ± 1.3247.512 ± 1.598
The results of the sanity check confirmed that MDmeas did not originate from an outlier. This assessment was conducted by comparing PE¯D99.9 with PE¯MD, as presented in Table 1. It is evident from the table that PE¯D99.9 consistently surpasses PE¯MD. This observation demonstrates that MDmeas aligns more closely with the true values compared with D99.9. Thus, it proves that MDmeas is derived from a properly reconstructed portion of the mesh and not from an outlier. Moreover, the mean and standard deviation of PE¯D99.9 exhibit similar patterns to PE¯MD. This discovery further confirms that MDmeas remains unaffected by outliers in the 3D reconstruction process. This finding confirms that MDmeas is not influenced by outliers in the 3D reconstruction process. It must be noticed that the results shown here are specific to the data-capturing device (HL2) used in this particular set of experiments. The errors reported here can be further diminished with the use of higher-resolution RGB cameras.

Field Experiment

Our method was deployed to measure actual spalling damage in a real bridge shown in Fig. 12(a). The spalling was located at the bridge soffit at a height of about 3.4 m above a sidewalk. Typically, such defects are measured as per OSIM within an arm’s length reach and assigned a severity class during a regular inspection according to Table 2 where only one of the two inplane measurements or depth measurements must be fulfilled to determine the severity level. We demonstrate how the proposed method streamlines the inspection outlined in the OSIM manual in this experiment. Fig. 12(b) shows how the user equipped with HL2 collects data required for performing dense reconstructions. We also collected data from Microsoft Azure Kinect DK equipped with a 1 MP ToF depth camera at a close range (−0.5 m) as shown in Fig. 12(c) as ground truth. Since this range is the closest standoff distance prescribed by Microsoft, the point cloud data collected at this distance is used as the ground truth to evaluate the accuracy of our size and depth measurement. In this study, we leverage the ground truth point cloud to assess the inplane size (maximum length in any inplane direction) and depth, as defined by OSIM. These measurements can be obtained using straightforward point cloud operations, without the need for parameter-based postprocessing. However, quantifying the defect volume requires meshing, a process highly influenced by the chosen parameters, as previously discussed. Therefore, our method’s accuracy evaluation focuses exclusively on the ground truth point cloud, considering solely the inplane size and depth metrics as evaluation criteria, while excluding the volume measurement.
Fig. 12. Onsite experiments: (a) site location; (b) user collecting data to perform dense reconstruction; and (c) user collecting ground truth data.
Fig. 13 shows the summary of the experiment result. A user with HL2 captured multiple images (27) of the defect from different viewpoints by looking at it directly and moving underneath the soffit. Sample images are shown in Fig. 13(a). We use the gesture control function in the HL2 to capture images. Once the images are collected, the user performs damage segmentation using XRIV-based interactive segmentation. Seed points inside (light) and outside (dark) of the damage region in Fig. 13(b) are selected by gesture control and the segmentation algorithm detects the boundary of the spalling in Fig. 13(c). Fig. 13(d) displays the sparse SfM point cloud reconstructed using OpenMVG. Outputs from SfM were utilized to implement OpenMVS and reconstruct a dense point cloud as shown in Fig. 13(e). Finally, Fig. 13(f) shows the reconstructed mesh surface of the damage, along with the points (outside the mesh) used to evaluate the hypothetical flat plane’s location, which is subsequently used for quantifications.
Fig. 13. Steps involved in deploying our method onsite: (a) sample images collected onsite; (b) seed points for XRIV selected using gesture control on HL2; (c) segmentation mask obtained from XRIV; (d) sparse point cloud obtained from SfM; (e) dense reconstruction from MVS; and (f) mesh from Poisson reconstruction and sampled points used for plane surrounding the mesh.
Most parameters used for the method are the same as the ones used for the laboratory experiments, but some parameters had to be optimized for this particular damage given its smaller physical size and shallow depth in comparison to the mockup damage. The parameters adjusted included the number of nearest neighbors for normal computation, which was increased to 40 to account of smoother damage, and RANSAC iterations for plane fitting were increased to 105 . Since the damage is physically smaller, the level of detail to be expected when creating a 3D surface has to be lower than the one used for a much larger and deeper artificial damage.
From the ground truth point cloud collected using Azure Kinect, inplane size of spalling was found out as 268 mm and the depth of damage was analytically found out to be 10.5 mm by performing simple point cloud transformations. It is to be noted that measurement of volume involved in surface reconstruction from the point cloud inherently incurs a loss of fine-grained detail and potential deviation from the ground truth. This divergence from the physical volume is an expected consequence. Using point cloud processing techniques as discussed in this paper, the volume of the defect from the Azure Kinect was found out to be 39.81 cm3. Using our image-based method, the measured volume of defect is found to be 24.94 cm3 (−37% deviation from the volume from Azure Kinect) and the depth of defect is found to be 7.7 mm (−27% deviation from ground truth). Since the defect is reconstructed without losing the real-world scale, the inplane size can be measured, and it is 260 mm (−3% deviation from ground truth). The error of our depth measurement can be potentially attributed to the very shallow surface and inadequate lighting conditions that can be mitigated by having a static light source. As per these measurements, the defect can be classified as one of medium severity as per OSIM (Table 3). Thus, classification of spalling damages can be made utilizing the proposed method and the method can potentially replace traditional manual measurement techniques to perform quantifications on site.
Table 3. OSIM severity classes for spalling type of damage
Severity levelPhysical condition of spall damage (in-plane measurements)Physical condition of spall damage (depth measurements)
Light<150 mm<25 mm
Mediumbetween 150 to 300 mmbetween 25 and 50 mm
Severebetween 300 and 600 mmbetween 50 and 100 mm
Very severe> 600 mm> 100 mm

Conclusion

This study provides an evaluation of an image-based dense reconstruction method for full dimensional quantification of spalling damages. The proposed method can measure the volumetric loss, inplane dimensions, and depth of the damage by performing inscale 3D reconstruction of the spalling damage using multiple images collected from different viewpoints. The novelty of the method lies in the comprehensive evaluation of the techniques and parameters required for as accurate as possible reconstructions for high fidelity measurements. Our proposed method first uses XRIV segmentation technique to isolate the target region (spalling) from the rest of the background (scene), followed by sparse and dense point cloud reconstruction using SfM and MVS, respectively. These reconstructions are performed in real-world 3D space on account of using embedded pose information from the image-capturing device. In this paper, the authors demonstrate the use of Microsoft HL2 for this purpose to access the pose information. The point cloud reconstruction is followed by Poisson meshing to reconstruct the surfaces. Various mesh parameters are discussed, and a guideline is provided on their fine-tuning since the meshing relies on the quality of point cloud reproduced. A novel method of calculating volumetric loss and maximum depth of the damage from the mesh surface is presented. The method is then evaluated on a set of laboratory experiments with ground truth data to assess its suitability. An in-depth analysis is done to estimate appropriate parameters based on the standoff distance to perform quantifications. The paper also delves into an examination of the influence of image brightness on the observed effects on the measured volume and depth. An outline is provided for the method to be deployed in the field and it is experimentally demonstrated on an onsite spalling damage present on a soffit of a bridge. The methodology presented in this article can enhance the visual inspection process for civil engineering assets and potentially replace certain manual components of measuring the physical dimensions of defects.

Data Availability Statement

Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request, specifically, point cloud processing codes.

Acknowledgments

We acknowledge the support from Rogers Communications and the Ontario Centers of Excellence via the Voucher for Innovation and Productivity Program (OCE34028) and the Natural Sciences and Engineering Research Council of Canada (NSERC) (RGPIN-2020-03979).

References

3D flow. 2022. Accessed November 2, 2022. https://www.3dflow.net/.
Adhikari, R. S., O. Moselhi, and A. Bagchi. 2013. “A study of image-based element condition index for bridge inspection.” In Proc., 30th Int. Symp. on Automation and Robotics in Construction and Mining (ISARC 2013): Building the Future in Automation and Robotics, edited by F. Hassani, O. Moselhi, and C. Haas, 345–356. Edinburgh, UK: International Association for Automation and Robotics in Construction (IAARC).
Agisoft. 2022. Accessed November 2, 2022. https://www.agisoft.com/.
Alcantarilla, P., J. Nuevo, and A. Bartoli. 2013. “Fast explicit diffusion for accelerated features in nonlinear scale spaces.” In Proc., British Machine Vision Conf., 13.1–13.11. Bristol, UK: BMVA Press.
Alcantarilla, P. F., A. Bartoli, and A. J. Davison. 2012. “KAZE features.” In Computer vision—European conf. on computer vision 2012, edited by A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, 214–227. Berlin, Germany: Springer. Lecture Notes in Computer Science.
Al-Sabbag, Z. A., J. P. Connelly, C. M. Yeum, and S. Narasimhan. 2020. “Real-Time quantitative visual inspection using extended reality.” J. Comput. Vision Imaging Syst. 6 (1): 1–3. https://doi.org/10.15353/jcvis.v6i1.3557.
Al-Sabbag, Z. A., C. M. Yeum, and S. Narasimhan. 2022. “Interactive defect quantification through extended reality.” Adv. Eng. Inf. 51 (January): 101473. https://doi.org/10.1016/j.aei.2021.101473.
Arapakopoulos, A., et al. 2022. “3D reconstruction & modeling of the traditional Greek trechadiri: “Aghia Varvara.’.” Heritage 5 (2): 1295–1309. https://doi.org/10.3390/heritage5020067.
Armeni, I., O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese. 2016. “3D semantic parsing of large-scale indoor spaces.” In Proc., Institute of Electrical and Electronics Engineers Conf. on Computer Vision and Pattern Recognition, 1534–1543. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
Atha, D. J., and M. R. Jahanshahi. 2018. “Evaluation of deep learning approaches based on convolutional neural networks for corrosion detection.” Struct. Health Monit. 17 (5): 1110–1128. https://doi.org/10.1177/1475921717737051.
Autodesk. 2022. Accessed November 2, 2022. https://www.autodesk.ca/en/products/recap/overview.
Barnes, C., E. Shechtman, A. Finkelstein, and D. B. Goldman. 2009. “Patchmatch: A randomized correspondence algorithm for structural image editing.” In Proc., ACM SIGGRAPH 2009 Papers. New York: Association for Computing Machinery.
Beckman, G. H., D. Polyzois, and Y.-J. Cha. 2019. “Deep learning-based automatic volumetric damage quantification using depth camera.” Autom. Constr. 99 (March): 114–124. https://doi.org/10.1016/j.autcon.2018.12.006.
Bentley Systems. 2022. Accessed June 14, 2022. https://www.bentley.com/software/contextcapture-viewer/.
Bernardini, F., J. Mittleman, and H. Rushmeier. 2000. “The ball-pivoting algorithm for surface reconstruction.” IEEE Trans. Visual Comput. Graphics 5 (November). Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
Brownjohn, J. M. W., J. Lee, and B. Cheong. 1999. “Dynamic performance of a curved cable-stayed bridge.” Eng. Struct. 21 (11): 1015–1027. https://doi.org/10.1016/S0141-0296(98)00046-7.
Campos, T. J. F. L., E. V. de Francisco, and M. F. H. Rocha. 2021. “Assessment of the complexity of renal tumors by Nephrometry (R.E.N.A.L. Score) with CT and MRI images versus 3D reconstruction model images.” Int. Braz. J. Urol. 47 (July): 896–901. https://doi.org/10.1590/s1677-5538.ibju.2020.0930.
Chen, F.-C., and M. R. Jahanshahi. 2018. “NB-CNN: Deep learning-based crack detection using convolutional neural network and naïve Bayes data fusion.” IEEE Trans. Ind. Electron. 65 (5): 4392–4400. https://doi.org/10.1109/TIE.2017.2764844.
Eltner, A., and G. Sofia. 2020. “Structure from motion photogrammetric technique.” Dev. Earth Surf. Processes 23: 1–24. https://doi.org/10.1016/B978-0-444-64177-9.00001-1.
Fischler, M. A., and R. C. Bolles. 1981. “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography.” Commun. ACM 24 (6): 381–395. https://doi.org/10.1145/358669.358692.
Furukawa, Y., and C. Hernández. 2013. “Multi-view stereo: A tutorial.” Found. Trends Comp. Graphics Vision 9 (1–2): 1–148. https://doi.org/10.1561/0600000052.
Furukawa, Y., and J. Ponce. 2010. “Accurate, dense, and robust multiview stereopsis.” IEEE Trans. Pattern Anal. Mach. Intell. 32 (8): 1362–1376. https://doi.org/10.1109/TPAMI.2009.161.
Golparvar-Fard, M., J. Bohn, J. Teizer, S. Savarese, and F. Peña-Mora. 2011a. “Evaluation of image-based modeling and laser scanning accuracy for emerging automated performance monitoring techniques.” Autom. Constr. 20 (8): 1143–1155. https://doi.org/10.1016/j.autcon.2011.04.016.
Golparvar-Fard, M., F. Peña-Mora, and S. Savarese. 2011b. “Monitoring changes of 3D building elements from unordered photo collections.” In Proc., Institute of Electrical and Electronics Engineers Int. Conf. on Computer Vision Workshops, 249–256. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
He, K., G. Gkioxari, P. Dollar, and R. Girshick. 2017. “Mask R-CNN.” IEEE Int. Conf. on Computer Vision (ICCV), 2980–2988. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
Hoskere, V., N. Yasutaka, T. Hoang, and B. Spencer Jr. 2017. “Vision-Based structural inspection using multiscale deep convolutional neural networks.” In Proc., 3rd Huixian Int. Forum on Earthquake Engineering for Young Researchers. Urbana, IL: University of Illinois Board of Trustees.
Hoskere, V., Y. Narazaki, T. A. Hoang, and Jr B. F. Spencer. 2018. “Towards automated postearthquake inspections with deep learning-based condition-aware models.” In 7th World Conf. on Structural Control and Monitoring. Harbin, China: Harbin Institute of Technology (HIT).
Kadi, H., and K. Anouche. 2020. “Knowledge-based parametric modeling for heritage interpretation and 3D reconstruction.” Digital Appl. Archaeol. Cult. Heritage 19: e00160. https://doi.org/10.1016/j.daach.2020.e00160.
Kazhdan, M., M. Bolitho, and H. Hoppe. 2006. “Poisson surface reconstruction.” In Proc., 4th Eurographics Symp. on Geometry Processing, 61–70. Goslar, Germany: Eurographics Association.
Kim, B., and S. Cho. 2020. “Automated multiple concrete damage detection using instance segmentation deep learning model.” Appl. Sci. 10 (22): 8008. https://doi.org/10.3390/app10228008.
Kim, H., E. Ahn, M. Shin, and S.-H. Sim. 2019. “Crack and noncrack classification from concrete surface images using machine learning.” Struct. Health Monit. 18 (3): 725–738. https://doi.org/10.1177/1475921718768747.
Lhuillier, M. 2011. “Fusion of GPS and structure-from-motion using constrained bundle adjustments.” In Proc., Computer Vision and Pattern Recognition 2011, 3025–3032. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
Lowe, D. G. 2004. “Distinctive image features from scale-invariant keypoints.” Int. J. Comput. Vision 60 (2): 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94.
Lu, R., I. Brilakis, and C. R. Middleton. 2019. “Detection of structural components in point clouds of existing RC bridges.” Comput.-Aided Civ. Infrastruct. Eng. 34 (3): 191–212. https://doi.org/10.1111/mice.12407.
Maiti, A., and D. Chakravarty. 2016. “Performance analysis of different surface reconstruction algorithms for 3D reconstruction of outdoor objects from their digital images.” SpringerPlus 5 (1): 932. https://doi.org/10.1186/s40064-016-2425-9.
Mapillary. 2017. “Mapillary/OpenSfM.” GitHub. Accessed January 25, 2023. https://github.com/mapillary/OpenSfM.
McLaughlin, E., N. Charron, and S. Narasimhan. 2020. “Automated defect quantification in concrete bridges using robotics and deep learning.” J. Comput. Civil Eng. 34 (5): 04020029. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000915.
Metni, N., and T. Hamel. 2007. “A UAV for bridge inspection: Visual servoing control law with orientation limits.” Autom. Constr. 17 (1): 3–10. https://doi.org/10.1016/j.autcon.2006.12.010.
Mishra, M., T. Barman, and G. V. Ramana. 2022. “Artificial intelligence-based visual inspection system for structural health monitoring of cultural heritage.” J. Civ. Struct. Health Monit. 14: 103–120. https://doi.org/10.1007/s13349-022-00643-8.
Moulon, P., P. Monasse, R. Perrot, and R. Marlet. 2016. “OpenMVG: Open multiple view geometry.” In Proc., 1st Workshop on Reproducible Research in Pattern Recognition, 60–74. Cham, Switzerland: Springer.
Newcombe, R. A., S. J. Lovegrove, and A. J. Davison. 2011. “DTAM: Dense tracking and mapping in real-time.” In Proc., Int. Conf. on Computer Vision, 2320–2327. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
Nikolov, I., and C. Madsen. 2016. “Benchmarking close-range structure from motion 3D reconstruction software under varying capturing conditions.” In Digital heritage. progress in cultural heritage: Documentation, preservation, and protection, edited by M. Ioannides, E. Fink, A. Moropoulou, M. Hagedorn-Saupe, A. Fresa, G. Liestøl, V. Rajcic, and P. Grussenmeyer, 15–26. Cham, Switzerland: Springer. Lecture Notes in Computer Science.
Nyimbili, P., H. Demirel, D. Seker, and T. Erden. 2016. “Structure from Motion (SfM) – approaches and applications.” In Proc., Int. Scientific Conference on Applied Sciences. Amsterdam, The Netherlands: Atlantis Press.
ONMTO (Ministry of Transportation). 2000. Ontario structure inspection manual. Toronto: ONMTO.
Park, J. A., C. M. Yeum, and T. D. Hrynyk. 2021. “Learning-based image scale estimation using surface textures for quantitative visual inspection of regions-of-interest.” Comput.-Aided Civ. Infrastruct. Eng. 36 (2): 227–241. https://doi.org/10.1111/mice.12613.
Perez-Perez, Y., M. Golparvar-Fard, and K. El-Rayes. 2016. “Semantic and geometric labeling for enhanced 3D point cloud segmentation.” In Construction Research Congress 2016, 2452–2552. Hoboken, NJ: John Wiley & Sons, Inc.
Pix4D. 2022. Accessed November 2, 2022. https://www.pix4d.com/.
RealityCapture. 2022. Accessed November 2, 2022. https://www.capturingreality.com/.
Redmon, J., S. Divvala, R. Girshick, and A. Farhadi. 2016. “You only look once: Unified, real-time object detection.” In Proc., Institute of Electrical and Electronics Engineers Conf. on Computer Vision and Pattern Recognition, 779–788. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
Rublee, E., V. Rabaud, K. Konolige, and G. Bradski. 2011. “ORB: An efficient alternative to SIFT or SURF.” In Proc., Int. Conf. on Computer Vision, 2564–2571. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
Scharstein, D., R. Szeliski, and R. Zabih. 2001. “A taxonomy and evaluation of dense Two-frame stereo correspondence algorithms.” In Proc., Institute of Electrical and Electronics Engineers Workshop on Stereo and Multi-Baseline Vision, 131–140. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
Schonberger, J. L., and J.-M. Frahm. 2016. “Structure-from-Motion revisited.” In Proc., Institute of Electrical and Electronics Engineers Conf. on Computer Vision and Pattern Recognition, 4104–4113. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
Schönberger, J. L., E. Zheng, J.-M. Frahm, and M. Pollefeys. 2016. “Pixelwise view selection for unstructured multi-view stereo.” In Vol. 9907 of Proc., Computer VisionEuropean Conf. on Computer Vision, edited by B. Leibe, J. Matas, N. Sebe, and M. Welling, 501–518. Cham, Switzerland: Springer. Lecture Notes in Computer Science.
Shen, S. 2013. “Accurate multiple view 3D reconstruction using patch-based stereo for large-scale scenes.” IEEE Trans. Image Process. 22 (5): 1901–1914. https://doi.org/10.1109/TIP.2013.2237921.
Skarlatos, D., and S. Kiparissi. 2012. “Comparison of laser scanning, photogrammetry and SFM-MVS pipeline applied in structures and artificial surfaces.” ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. I-3 (July): 299–304. https://doi.org/10.5194/isprsannals-I-3-299-2012.
Snavely, N. 2013. “Snavely/Bundler_sfm.” Accessed January 25, 2023. https://github.com/snavely/bundler_sfm.
Sofiiuk, K., I. Petrov, O. Barinova, and A. Konushin. 2020. “F-BRS: Rethinking backpropagating refinement for interactive segmentation.” In Proc., Institute of Electrical and Electronics Engineers/CVF Conf. on Computer Vision and Pattern Recognition, 8623–8632. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE).
Sony, S., S. Laventure, and A. Sadhu. 2019. “A literature review of next-generation smart sensing technology in structural health monitoring.” Struct. Control Health Monit. 26 (3): e2321. https://doi.org/10.1002/stc.2321.
Spencer, B. F., V. Hoskere, and Y. Narazaki. 2019. “Advances in computer vision-based civil infrastructure inspection and monitoring.” Engineering 5 (2): 199–222. https://doi.org/10.1016/j.eng.2018.11.030.
Teng, S., Z. Liu, and X. Li. 2022. “Improved YOLOv3-based bridge surface defect detection by combining high- and low-resolution feature images.” Buildings 12 (8): 1225. https://doi.org/10.3390/buildings12081225.
Vardanega, P. J., G. T. Webb, P. R. A. Fidler, and C. R. Middleton. 2016. “Bridge monitoring.” In Innovative bridge design handbook, edited by A. Pipinato, 759–775. Oxford, UK: Butterworth-Heinemann.
Xiong, X., A. Adan, B. Akinci, and D. Huber. 2013. “Automatic creation of semantically rich 3D building models from laser scanner data.” Autom. Constr. 31: 325–337. https://doi.org/10.1016/j.autcon.2012.10.006.
Xu, Y., S. Li, D. Zhang, Y. Jin, F. Zhang, N. Li, and H. Li. 2018. “Identification framework for cracks on a steel structure surface by a restricted boltzmann machines algorithm based on consumer-grade camera images.” Struct. Control Health Monit. 25 (2): e2075. https://doi.org/10.1002/stc.2075.
Yeum, C. 2016. “Computer vision-based structural assessment exploiting large volumes of images.” Open Access Dissertations. Accessed February 5, 2023. https://docs.lib.purdue.edu/open_access_dissertations/1036.
Zhang, A., K. C. P. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Fei, Y. Liu, J. Q. Li, and C. Chen. 2017. “Automated pixel-level pavement crack detection on 3D asphalt surfaces using a deep-learning network.” Comput.-Aided Civ. Infrastruct. Eng. 32 (10): 805–819. https://doi.org/10.1111/mice.12297.
Zhao, S., F. Kang, J. Li, and C. Ma. 2021. “Structural health monitoring and inspection of dams based on UAV photogrammetry with image 3D reconstruction.” Autom. Constr. 130: 103832. https://doi.org/10.1016/j.autcon.2021.103832.
Zhou, Q.-Y., J. Park, and V. Koltun. 2018. “Open3D: A modern library for 3D data processing.” Preprint, submitted January 30, 2018. https://doi.org/10.48550/arXiv.1801.09847.

Information & Authors

Information

Published In

Go to ASCE OPEN: Multidisciplinary Journal of Civil Engineering
ASCE OPEN: Multidisciplinary Journal of Civil Engineering
Volume 2December 2024

History

Received: Aug 3, 2023
Accepted: Dec 6, 2023
Published online: Feb 28, 2024
Discussion open until: Jul 28, 2024
Published in print: Dec 31, 2024

Authors

Affiliations

Ph.D. Student, Dept. of Civil and Environmental Engineering, Univ. of Waterloo, 200 Univ. Ave. West, Waterloo, ON, Canada N2L 3G1. ORCID: https://orcid.org/0000-0002-4515-878X.
Zaid Abbas Al-Sabbag
Ph.D. Student, Dept. of Civil and Environmental Engineering, Univ. of Waterloo, 200 Univ. Ave. West, Waterloo, ON, Canada N2L 3G1.
Assistant Professor, Dept. of Civil and Environmental Engineering, Univ. of Waterloo, 200 Univ. Ave. West, Waterloo, ON, Canada N2L 3G1 (corresponding author). ORCID: https://orcid.org/0000-0002-7793-1079. Email: [email protected]
Sriram Narasimhan, Ph.D., P.Eng., F.ASCE
Professor, UCLA Samueli School of Engineering, Civil and Environmental Engineering, 7400 Boelter Hall (4731H), Los Angeles, CA 90095.

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share