TECHNICAL PAPERS
Sep 1, 2005

Automated Procedure to Assess Civil Infrastructure Data Quality: Method and Validation

Publication: Journal of Infrastructure Systems
Volume 11, Issue 3

Abstract

Monitoring data are collected to measure the condition, environment, usage, and performance of civil infrastructure. High quality monitoring data are necessary for decision-support systems, design analysis, and research. However, little work has been done in the area of generic, automated data quality assessment and cleansing procedures. We have developed an automated, two-level data quality assessment procedure to address this deficiency. In the first level of our procedure, several different data quality assessment methods are used in a voting scheme to identify concentrations of anomalies in aggregate data. In the second level, differences between anomalies and normal data at the individual data level are identified; combined with domain knowledge, these differences can be used to identify different types of errors, such as missing data and calibration errors. In our case studies, we have been able to effectively cleanse the data using the results from our data quality assessment procedure. We have also developed a test bench to explore the sensitivity of the data quality assessment algorithms used in our approach. The test bench introduces a known error into a clean, artificial data set and then evaluates how well each assessment method identifies the error. The test bench results show that our approach is able to effectively identify anomalies, even those with small magnitudes of error.

Get full access to this article

View all available purchase options and get full access to this article.

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. NSFCMS-9987871 and partially supported by Illinois Department of Transportation through the Metropolitan Transportation Support Initiative (METSI) at University of Illinois, Chicago. The writers would also like to thank Margaret H. Chalkline and the Minnesota Department of Transportation for giving us the opportunity to study their weigh-in-motion data.

References

Agrawal, R., Imielinski, T., and Swami, A. (1993). “Mining association rules between sets of items in large databases.” Proc., ACM SIGMOD Int. Conf., Association of Computing Machinery, Washington, D.C., 207–216.
American Association of State Highway and Transportation Officials (AASHTO). (1986). AASHTO Guide for Design of Pavement Structures, Washington, D.C.
Buchheit, R. B. (2002). “Vacuum: Automated procedures for assessing and cleansing civil infrastructure data.” PhD thesis, Carnegie Mellon Univ., Pittsburgh, Penn.
Chapman, P., Clinton, J., Khabaza, T., Reinartz, T., and Wirth, R. (2000). “The CRISP-DM process model, The CRISP-DM Consortium, www.crisp-dm.org.”
Chen, M.-S., Han, J., and Yu, P. S. (1996). “Data mining: An overview from a database perspective.” IEEE Trans. Knowl. Data Eng., 8(6), 866–883.
Cortes, C., Jackel, L. D., and Chiang, W.-P. (1995). “Limits on learning machine accuracy imposed by data quality.” Proc., Int. Conf. on Knowledge Discovery and Data Mining (KDD95), Association for Computing Machinery, Washington, D.C., 57–62.
D’Agostino, R. B., and Stephens, M. A. (1986). Goodness-of-fit techniques, Dekker, New York.
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., eds. (1996). Advances in knowledge discovery and data mining, MIT Press, Cambridge, Mass.
Federal Highway Administration (FHWA) (2001). “Traffic monitoring guide.” FHWA-PL-01-021, Washington, D.C.
Federal Highway Administration (FHWA). (1998). “Understanding traffic variations by vehicle classifications.” FHWA-RD-98-117, Washington, D.C.
Feller, W. (1968). An introduction to probability theory and its applications, Wiley, New York.
Giles, D. E. A. (2000). “A saddlepoint approximation to the distribution function of the Anderson-Darling test statistics.” Econometrics Working Paper, Dept. of Economics, Univ. of Victoria, British Columbia, Canada.
Hand, D. J. (2000). “New challenges for statisticians.” Soc. Sci. Comput. Rev., 18(4), 442–449.
Hudson, W. R., Haas, R., and Uddin, W. (1997). Infrastructure management, McGraw-Hill, New York.
Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). “Data clustering: A review.” ACM Comput. Surv., 31(3), 264–323.
Maletic, J. I., and Marcus, A. (2000). “Data cleansing: Beyond integrity checking.” Proc. Conf. on Information Quality (IQ2000), Massachusetts Institute of Technology, Cambridge, Mass., 200–209.
Minnesota Department of Transportation (MDOT). (2000). Minnesota Trucking Regulations, Office of Motor Carrier Services, St. Paul, Minn.
Scheaffer, R. L., and McClave, J. T. (1995). Probability and statistics for engineers, Duxbury Press, Belmont, Calif.

Information & Authors

Information

Published In

Go to Journal of Infrastructure Systems
Journal of Infrastructure Systems
Volume 11Issue 3September 2005
Pages: 180 - 189

History

Received: Jul 10, 2002
Accepted: Nov 29, 2004
Published online: Sep 1, 2005
Published in print: Sep 2005

Permissions

Request permissions for this article.

Authors

Affiliations

Rebecca Bari Buchheit
Ab Initio, Lexington, MA.
James H. Garrett Jr., M.ASCE [email protected]
Professor, Dept. of Civil and Environmental Engineering, Carnegie Mellon Univ., Pittsburgh, PA 15213-3890. E-mail: [email protected]
Sue McNeil, M.ASCE [email protected]
Director, Urban Transportation Center, Univ. of Illinois-Chicago, Chicago, IL 60607. E-mail: [email protected]
Research Assistant, Dept. of Civil and Environmental Engineering, Carnegie Mellon Univ., Pittsburgh, PA 15213-3890. E-mail: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share