TECHNICAL PAPERS

Mar 1, 2006

Knowledge Discovery in a Facility Condition Assessment Database Using Text Clustering

Authors: H. S. Ng, A. Toukourou, and L. SoibelmanAuthor Affiliations

Publication: Journal of Infrastructure Systems

Volume 12, Issue 1

https://doi.org/10.1061/(ASCE)1076-0342(2006)12:1(50)

Abstract

Knowledge discovery in databases (KDD) has been applied in many different areas of study including DNA sequence analysis, pattern discovery, document classification, image recognition, and speech recognition. This paper presents the application of KDD in the analysis of a facility condition assessment (FCA) database. The FCA database contains information on facilities located at three campuses within a statewide university system. The case study utilizes cluster analysis for text mining. Cluster analysis is the grouping of objects that are similar within the same cluster and dissimilar to the other clusters. In this analysis, deficiency descriptions from a university’s FCA database are the objects being grouped together into clusters. Deficiency descriptions were gathered from 15 housing facilities and 15 academic facilities located at 3 campuses. The results show how some clusters of facility deficiencies are unique with respect to the type of facility and the influence of location on deficiencies of academic facilities. The paper begins with a presentation of background on clustering approaches in KDD. Next, a case study based on a higher education FCA database is presented. Last, the paper concludes by exploring other potential areas of application of the described clustering approach.

Get full access to this article

View all available purchase options and get full access to this article.

Acknowledgment

This material is based upon work supported by the National Science Foundation under Grant No. NSF0093841 (CAREER).

References

Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Modern information tetrieval, Association for Computing Machinery (ACM) Press, New York.

Caldas, C. H., and Soibelman, L. (2002). “Implementing automated methods for document classification in construction management information systems.” Proc., Computing in Civil Engineering ASCE, Reston, Va., 194–210.

Cheng, D., Kannan, R., Vempala, S., and Wang, G. (2002). “On a recursive spectral algorithm for clustering from pairwise similarities.” Technical Paper, Yale Univ., New Haven, Conn., and Massachusetts Institute of Technology, Cambridge, Mass.

Chi, E., Rosien, A., and Heer, J. (2002). “LumberJack: Intelligent discovery and analysis of web user traffic.” Technical Paper, Palo Alto Research Center, Palo Alto, Calif.

Cutting, D. R., Karger, D. R., Pedersen, J. O., and Tukey, J. W. (1992). “Scatter/gather: A cluster-based approach to browsing large document collections.” Proc., 15th Annual Int. Association for Computing Machinery—Special Interest Group on Information Retrieval (ACM SIGIR) Conf. on Research and Development in Information Retrieval, Copenhagen, Denmark, 318–329.

Ding, H., and He, X. (2002). “Cluster merging and splitting in hierarchical clustering algorithms.” Technical Paper, Univ. of California–Berkeley, Lawrence Berkeley National Laboratory, Berkeley, Calif.

Fayyad, G., Shapiro, P., and Smyth, P. (1996). “From data mining to knowledge discovery in databases.” Artif. Intell., 17(3), 37–54.

Frakes, W. F., and Baeza-Yates, R. (1992). Information retrieval: Data structures and algorithms, Prentice–Hall, Englewood Cliffs, N.J.

Han, E. H., Karypis, G., Kumar, V., and Mobasher, B. (1998). “Hypergraph-based clustering in high-dimensional datasets: A summary of results.” Bulletin of the Institute of Electrical and Electronics Engineers (IEEE) Technical Committee on Data Engineering, 21(1), 15–22.

Han, J., and Kamber, M. (2000). Data mining: Concepts and techniques, Academic, San Diego, Calif.

Han, J., Kamber, M., and Tung, A. H. K. (2001). “Spatial clustering methods in data mining: A survey.” Geographic data mining and knowledge discovery, H. Miller, and J. Han, eds., Taylor and Francis, B. C., Canada.

Jain, A. K., and Dubes, R. C. (1988). Algorithms for clustering data, Prentice–Hall, Englewood Cliffs, N. J.

Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). “Data clustering: A review.” ACM Comput. Surv., 31(3), 264–323.

Kaufman, L., and Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis, Wiley, New York.

MacQueen, J. B. (1967). “Some methods for classification and analysis of multivariate observations.” Proc., 5th Symp. on Mathematical, Statistic, and Probability, Berkeley, Calif., 281–297.

Mannila, H. (1997). “Methods and problems in data mining.” Proc., Int. Conf. on Database Theory, Delphi, Greece.

Ng, R., and Han, J. (1994). “Efficient and effective clustering method for spatial data mining.” Proc., 20th Very Large Data Bases (VLDB) Conf., Santiago, Chile, 144–155.

Porter, M. F. (1980). “An algorithm for suffix stripping.” Program, 14(3), 130–137.

Robertson, S. E., and Jones, K. (1976). “Relevance weighting of search terms.” J. Am. Soc. Inf. Sci., 27(3), 129–146.

Rush, S. C., and Johnson, S. L. (1988). The decaying American campus: A ticking time bomb, Association of Physical Plant Administrators of Universities and Colleges, Alexandria, Va.

Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer, Addison-Wesley, Boston.

Salton, G., and Buckley, C. (1998). “Boostexter: A boosting-based system for text categorization.” Mach. Learn., 39(2/3) 135–168.

Soibelman, L., and Kim, H. (2002). “Data preparation process for construction knowledge generation through knowledge discovery in databases.” J. Comput. Civ. Eng., 16(1), 39–48.

Steinbach, M., Karypis, G., and Kumar, V. (2000). “A comparison of document clustering techniques.” Knowledge Discovery in Databases (KDD) Workshop on Text Mining, Boston.

van Rijsbergen, C. J. (1979). Information retrieval, Dept. of Computer Science, University of Glasgow, Butterworth, London.

Yang, Y., Guan, X., and You, J. (2002). “CLOPE: A fast and effective clustering algorithm for transactional data.” Technical Paper, Dept. of Computer Science/Engineering, Univ. of Shanghai, Shanghai, China.

Zamir, O., Etzioni, O., Madani, O., and Karp, R. M. (1997). “Fast and intuitive clustering of web documents.” Proc., 3rd Int. Conf. on Knowledge Discovery and Data Mining, New Port Beach, Calif., 287–290.

Zhao, Y., and Karypis, G. (2001). “Criterion functions and document clustering.” Technical Paper No. 01–40, Dept. of Computer Science/Army HPC Research Center, University of Minnesota, Minneapolis, Minn.

Zhao, Y., and Karypis, G. (2002). “Clustering in life sciences.” Technical Paper No. 02–16, Dept. of Computer Science, Univ. of Minnesota, Minneapolis, Minn.

Information & Authors

Information

Published In

Go to Journal of Infrastructure Systems

Journal of Infrastructure Systems

Volume 12 • Issue 1 • March 2006

Pages: 50 - 59

Copyright

© 2006 ASCE.

History

Received: Nov 4, 2002

Accepted: Jan 12, 2005

Published online: Mar 1, 2006

Published in print: Mar 2006

Permissions

Request permissions for this article.

Request Permissions

Authors

Affiliations

H. S. Ng

PhD Candidate, Dept. of Civil and Environmental Engineering, Univ. of Illinois at Urbana–Champaign, 3142 Newmark CE Lab, Urbana, IL 61801.

View all articles by this author

A. Toukourou

Graduate Student, Dept. of Civil and Environmental Engineering, Univ. of Illinois at Urbana–Champaign, 3142 Newmark CE Lab, Urbana, IL 61801.

View all articles by this author

L. Soibelman

Associate Professor, Dept. of Civil and Environmental Engineering, Carnegie Mellon Univ., Porter Hall 118N, Pittsburg, PA, 15213.

View all articles by this author

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)

ASCE Members: Please log in to see member pricing

Purchase

Save for later

ASCE Library Card (5 downloads)

$105.00

ASCE Library Card (20 downloads)

$280.00

Buy Single Article

$35.00

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)

ASCE Members: Please log in to see member pricing

Purchase

Save for later

ASCE Library Card (5 downloads)

$105.00

ASCE Library Card (20 downloads)

$280.00

Buy Single Article

$35.00

Media

Figures

Other

Tables

View full text|Download PDF