Knowledge Discovery in a Facility Condition Assessment Database Using Text Clustering
Publication: Journal of Infrastructure Systems
Volume 12, Issue 1
Abstract
Knowledge discovery in databases (KDD) has been applied in many different areas of study including DNA sequence analysis, pattern discovery, document classification, image recognition, and speech recognition. This paper presents the application of KDD in the analysis of a facility condition assessment (FCA) database. The FCA database contains information on facilities located at three campuses within a statewide university system. The case study utilizes cluster analysis for text mining. Cluster analysis is the grouping of objects that are similar within the same cluster and dissimilar to the other clusters. In this analysis, deficiency descriptions from a university’s FCA database are the objects being grouped together into clusters. Deficiency descriptions were gathered from 15 housing facilities and 15 academic facilities located at 3 campuses. The results show how some clusters of facility deficiencies are unique with respect to the type of facility and the influence of location on deficiencies of academic facilities. The paper begins with a presentation of background on clustering approaches in KDD. Next, a case study based on a higher education FCA database is presented. Last, the paper concludes by exploring other potential areas of application of the described clustering approach.
Get full access to this article
View all available purchase options and get full access to this article.
Acknowledgment
This material is based upon work supported by the National Science Foundation under Grant No. NSF0093841 (CAREER).
References
Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Modern information tetrieval, Association for Computing Machinery (ACM) Press, New York.
Caldas, C. H., and Soibelman, L. (2002). “Implementing automated methods for document classification in construction management information systems.” Proc., Computing in Civil Engineering ASCE, Reston, Va., 194–210.
Cheng, D., Kannan, R., Vempala, S., and Wang, G. (2002). “On a recursive spectral algorithm for clustering from pairwise similarities.” Technical Paper, Yale Univ., New Haven, Conn., and Massachusetts Institute of Technology, Cambridge, Mass.
Chi, E., Rosien, A., and Heer, J. (2002). “LumberJack: Intelligent discovery and analysis of web user traffic.” Technical Paper, Palo Alto Research Center, Palo Alto, Calif.
Cutting, D. R., Karger, D. R., Pedersen, J. O., and Tukey, J. W. (1992). “Scatter/gather: A cluster-based approach to browsing large document collections.” Proc., 15th Annual Int. Association for Computing Machinery—Special Interest Group on Information Retrieval (ACM SIGIR) Conf. on Research and Development in Information Retrieval, Copenhagen, Denmark, 318–329.
Ding, H., and He, X. (2002). “Cluster merging and splitting in hierarchical clustering algorithms.” Technical Paper, Univ. of California–Berkeley, Lawrence Berkeley National Laboratory, Berkeley, Calif.
Fayyad, G., Shapiro, P., and Smyth, P. (1996). “From data mining to knowledge discovery in databases.” Artif. Intell., 17(3), 37–54.
Frakes, W. F., and Baeza-Yates, R. (1992). Information retrieval: Data structures and algorithms, Prentice–Hall, Englewood Cliffs, N.J.
Han, E. H., Karypis, G., Kumar, V., and Mobasher, B. (1998). “Hypergraph-based clustering in high-dimensional datasets: A summary of results.” Bulletin of the Institute of Electrical and Electronics Engineers (IEEE) Technical Committee on Data Engineering, 21(1), 15–22.
Han, J., and Kamber, M. (2000). Data mining: Concepts and techniques, Academic, San Diego, Calif.
Han, J., Kamber, M., and Tung, A. H. K. (2001). “Spatial clustering methods in data mining: A survey.” Geographic data mining and knowledge discovery, H. Miller, and J. Han, eds., Taylor and Francis, B. C., Canada.
Jain, A. K., and Dubes, R. C. (1988). Algorithms for clustering data, Prentice–Hall, Englewood Cliffs, N. J.
Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). “Data clustering: A review.” ACM Comput. Surv., 31(3), 264–323.
Kaufman, L., and Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis, Wiley, New York.
MacQueen, J. B. (1967). “Some methods for classification and analysis of multivariate observations.” Proc., 5th Symp. on Mathematical, Statistic, and Probability, Berkeley, Calif., 281–297.
Mannila, H. (1997). “Methods and problems in data mining.” Proc., Int. Conf. on Database Theory, Delphi, Greece.
Ng, R., and Han, J. (1994). “Efficient and effective clustering method for spatial data mining.” Proc., 20th Very Large Data Bases (VLDB) Conf., Santiago, Chile, 144–155.
Porter, M. F. (1980). “An algorithm for suffix stripping.” Program, 14(3), 130–137.
Robertson, S. E., and Jones, K. (1976). “Relevance weighting of search terms.” J. Am. Soc. Inf. Sci., 27(3), 129–146.
Rush, S. C., and Johnson, S. L. (1988). The decaying American campus: A ticking time bomb, Association of Physical Plant Administrators of Universities and Colleges, Alexandria, Va.
Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer, Addison-Wesley, Boston.
Salton, G., and Buckley, C. (1998). “Boostexter: A boosting-based system for text categorization.” Mach. Learn., 39(2/3) 135–168.
Soibelman, L., and Kim, H. (2002). “Data preparation process for construction knowledge generation through knowledge discovery in databases.” J. Comput. Civ. Eng., 16(1), 39–48.
Steinbach, M., Karypis, G., and Kumar, V. (2000). “A comparison of document clustering techniques.” Knowledge Discovery in Databases (KDD) Workshop on Text Mining, Boston.
van Rijsbergen, C. J. (1979). Information retrieval, Dept. of Computer Science, University of Glasgow, Butterworth, London.
Yang, Y., Guan, X., and You, J. (2002). “CLOPE: A fast and effective clustering algorithm for transactional data.” Technical Paper, Dept. of Computer Science/Engineering, Univ. of Shanghai, Shanghai, China.
Zamir, O., Etzioni, O., Madani, O., and Karp, R. M. (1997). “Fast and intuitive clustering of web documents.” Proc., 3rd Int. Conf. on Knowledge Discovery and Data Mining, New Port Beach, Calif., 287–290.
Zhao, Y., and Karypis, G. (2001). “Criterion functions and document clustering.” Technical Paper No. 01–40, Dept. of Computer Science/Army HPC Research Center, University of Minnesota, Minneapolis, Minn.
Zhao, Y., and Karypis, G. (2002). “Clustering in life sciences.” Technical Paper No. 02–16, Dept. of Computer Science, Univ. of Minnesota, Minneapolis, Minn.
Information & Authors
Information
Published In
Copyright
© 2006 ASCE.
History
Received: Nov 4, 2002
Accepted: Jan 12, 2005
Published online: Mar 1, 2006
Published in print: Mar 2006
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.