Quantitative Research: Preparation of Incongruous Economic Data Sets for Archival Data Analysis
Publication: Journal of Construction Engineering and Management
Volume 136, Issue 1
Abstract
In the field of construction engineering and management, archival data sets are not always as correct and consistent as it would be desirable. Between different sources that are studied, e.g., companies, they may differ in format or content and within them, they may still be incongruous and require substantial preparation. This makes examining theories and extracting trends from historic data more difficult than it is for carefully controlled experimental studies or for collecting new data. The purpose of this paper is not to review the regression models that the writers developed during their research, but to focus on the data preparation that had to be applied before those analyses. The objective is to outline various techniques that can be applied to archival data that are related to construction engineering and management to give researchers a set of best practices on data preparation that can assist them in gleaning truths from them.
Get full access to this article
View all available purchase options and get full access to this article.
Acknowledgments
The first writer thanks Joseph D. Lombardo of Learning Seed and Justin P. Molineaux of Computech for their advice on creating effective pseudocode.
References
Abudayyeh, O., Dibert-DeYoung, A., and Jaselskis, E. J. (2004). “Analysis of trends in construction research: 1985–2002.” J. Constr. Eng. Manage., 130(3), 433–439.
Abudayyeh, O., Dibert-DeYoung, A., Rasdorf, W. J., and Melhem, H. (2006). “Research publication trends and topics in computing in civil engineering.” J. Comput. Civ. Eng., 20(1), 2–12.
Allmon, E., Haas, C. T., Borcherding, J. D., and Goodrum, P. M. (2000). “U.S. construction labor productivity trends, 1970–1998.” J. Constr. Eng. Manage., 126(2), 97–104.
Amado, V., and Virkler, M. R. (2006). “Using data mining to analyze archived traffic related data.” Proc., 2006 9th Int. Conf. on Applications of Advanced Technology in Transportation, K. C. P. Wang, B. L. Smith, D. R. Uzarski, and S. C. Wong, eds., ASCE, Reston, Va., 310–318.
Arboleda, C. A., and Abraham, D. M. (2004). “Fatalities in trenching operations—Analysis using models of accident causation.” J. Constr. Eng. Manage., 130(2), 273–280.
Attoh-Okine, N. O. (1997). “Rough set application to data mining principles in pavement management database.” J. Comput. Civ. Eng., 11(4), 231–237.
Bessler, F. T., Savic, D. A., and Walters, G. A. (2003). “Water reservoir control with data mining.” J. Water Resour. Plann. Manage., 129(1), 26–34.
Bodie, Z., Kane, A., and Marcus, A. J. (2002). Investments, 5th Ed., McGraw-Hill, New York.
Bureau of Labor Statistics. (2008). “Producer price indexes: Databases, tables & calculators by subject.” U.S. Dept. of Labor, ⟨http://www.bls.gov/data⟩ (July 24, 2008).
Caldas, C. H., and Soibelman, L. (2002). “Automated classification of construction project documents.” J. Comput. Civ. Eng., 16(4), 234–243.
Caldas, C. H., and Soibelman, L. (2006). “A combined text mining method to improve document management in construction projects.” Proc., 2006 Int. Conf. on Computing in Civil Engineering of ASCE, H. Rivard, H. Melhem, and E. Miresco, eds., ASCE, Reston, Va., 2912–2918.
Carter, G., and Smith, S. D. (2006). “Safety hazard identification on construction projects.” J. Constr. Eng. Manage., 132(2), 197–205.
Chang, S.-T. (2001). “Work-time model for engineers.” J. Constr. Eng. Manage., 127(2), 163–172.
Chevallier, N., and Russell, A. D. (1998). “Automated schedule generation.” Can. J. Civ. Eng., 25(6), 1059–1077.
Cowles, H. A., and Elfar, A. A. (1977). “Valuation of industrial property: A proposed model.” Eng. Econ., 23(3), 141–161.
Cox, R. F., Issa, R. A., and Frey, A. (2006). “Proposed subcontractor-based employee motivational model.” J. Constr. Eng. Manage., 132(2), 152–163.
Cross, T. L., and Perry, G. M. (1995). “Depreciation patterns for agricultural machinery.” Am. J. Agric. Econom., 77(1), 194–204.
Cross, T. L., and Perry, G. M. (1996). “Remaining value functions for farm equipment.” Appl. Eng. Agric., 12(5), 547–553.
De Veaux, R. D., and Hand, D. J. (2005). “How to lie with bad data.” Stat. Sci., 20(3), 231–238.
Douglas, J. (1975). Construction equipment policy, McGraw-Hill, New York.
Ezeldin, A. S., and Sharara, L. M. (2006). “Neural networks for estimating the productivity of concreting activities.” J. Constr. Eng. Manage., 132(6), 650–656.
Fan, H., AbouRizk, S. M., and Kim, H. (2007). “Building intelligent applications for construction equipment management.” Proc., 2007 ASCE Int. Workshop on Computing in Civil Engineering, L. Soibelman and B. Akinci, eds., ASCE, Reston, Va., 192–199.
Fan, H., AbouRizk, S. M., Kim, H., and Zaïane, O. (2008). “Assessing residual value of heavy construction equipment using predictive data mining model.” J. Comput. Civ. Eng., 22(3), 181–191.
Fayyad, U. M., and Smyth, P. (1999). “Cataloging and mining massive datasets for science data analysis.” J. Comput. Graph. Stat., 8(3), 589–610.
Goodall, C. R. (1999). “Data mining of massive datasets in healthcare.” J. Comput. Graph. Stat., 8(3), 620–634.
Green, S. B. (1991). “How many subjects does it take to do a regression analysis?” Multivar. Behav. Res., 26(3), 499–510.
Hajjar, D., and AbouRizk, S. M. (2000). “Integrating document management with project and company data.” J. Comput. Civ. Eng., 14(1), 70–77.
Hand, D. J., Blunt, G., Kelly, M. G., and Adams, N. M. (2000). “Data mining for fun and profit.” Stat. Sci., 15(2), 111–126.
Huang, X., and Hinze, J. (2003). “Analysis of construction worker fall accidents.” J. Constr. Eng. Manage., 129(3), 262–271.
Jeske, D. R., and Liu, R. Y. (2007). “Mining and tracking massive text data: Classification, construction of tracking statistics, and inference under misclassification.” Technometrics, 49(2), 116–128.
Kastens, T. (1997). “Farm machinery operation cost calculations.” Kansas State University Farm Management Guide Rep. No. MF-2244, Kansas State Univ. Agricultural Experiment Station and Cooperative Extension Service, Manhattan, Kan.
Lee, J.-R., Hsueh, S.-L., and Tseng, H.-P. (2008). “Utilizing data mining to discover knowledge in construction enterprise performance records.” Journal of Civil Engineering and Management, 14(2), 79–84.
Ling, Y. Y. (2002). “Model for predicting performance of architects and engineers.” J. Constr. Eng. Manage., 128(5), 446–455.
Liu, M., and Ling, Y. Y. (2005). “Modeling a contractor’s markup estimation.” J. Constr. Eng. Manage., 131(4), 391–399.
Lucko, G. (2003). “A statistical analysis and model of the residual value of different types of heavy construction equipment.” Ph.D. dissertation, Virginia Polytechnic Institute and State Univ., Blacksburg, Va.
Messner, J. I. (2003). “An architecture for knowledge management in the AEC industry.” Proc., 2003 Construction Research Congress, K. R. Molenaar and P. S. Chinowsky, eds., ASCE, Reston, Va.
Mitchell, Z. W. (1998). “A statistical analysis of construction equipment repair costs using field data & the cumulative cost model.” Ph.D. dissertation, Virginia Polytechnic Institute and State Univ., Blacksburg, Va.
Mohamed, S. (2002). “Safety climate in construction site environments.” J. Constr. Eng. Manage., 128(5), 375–384.
Montgomery, D. C., Peck, E. A., and Vining, G. G. (2001). Introduction to linear regression analysis, 3rd Ed., Wiley, New York.
Nawari, N. O. (2008). “The role of data mining techniques in the prediction of hurricane damages.” Proc., 2008 Structures Congress, D. Anderson, C. Ventura, D. Harvey, and M. Hoit, eds., ASCE, Reston, Va., 1–10.
Ng, H. S., and Soibelman, L. (2003). “Knowledge discovery in maintenance databases: Enhancing the maintainability in higher education facilities.” Proc., 2003 Construction Research Congress, K. R. Molenaar and P. S. Chinowsky, eds., ASCE, Reston, Va.
Perry, G. M., Bayaner, A., and Nixon, C. J. (1990). “The effect of usage and size on tractor depreciation.” Am. J. Agr. Econ., 72(2), 317–325.
Pietroforte, R., and Stefani, T. P. (2004). “ASCE Journal of Construction Engineering and Management: Review of the years 1983–2000.” J. Constr. Eng. Manage., 130(3), 440–448.
Pipino, L. L., Lee, Y. W., and Wang, R. Y. (2002). “Data quality assessment.” Commun. ACM, 45(4), 211–218.
Rajagopalan, B., and Isken, M. W. (2001). “Exploiting data preparation to enhance mining and knowledge discovery.” IEEE Trans. Syst. Man Cybern., Part C Appl. Rev., 31(4), 460–467.
Redman, T. C. (1998). “The impact of poor data quality on the typical enterprise.” Commun. ACM, 41(2), 79–82.
Roddis, W. M. K., and Zhang, L. (2000). “Equation discovery in databases from engineering.” Proc., 2000 8th Int. Conf. on Computing in Civil and Building Engineering, R. Fruchter, F. Peña-Mora, and W. M. K. Roddis, eds., ASCE, Reston, Va., 890–897.
Rojas, E. M., and Aramvareekul, P. (2003). “Is construction labor productivity really declining?” J. Constr. Eng. Manage., 129(1), 41–46.
Rojas, E. M., and Kell, I. (2008). “Comparative analysis of project delivery systems cost performance in Pacific Northwest public schools.” J. Constr. Eng. Manage., 134(6), 387–397.
Soibelman, L., and Kim, H. (2002). “Data preparation process for construction knowledge generation through knowledge discovery in databases.” J. Comput. Civ. Eng., 16(1), 39–48.
Stegemann, J., and Buenfeld, N. (2004). “Mining of existing data for cement-solidified wastes using neural networks.” J. Environ. Eng., 130(5), 508–515.
Thomas, H. R. (2000). “Schedule acceleration, work flow, and labor productivity.” J. Constr. Eng. Manage., 126(4), 261–267.
Thomas, H. R., and Horman, M. J. (2006). “Fundamental principles of workforce management.” J. Constr. Eng. Manage., 132(1), 97–104.
Vorster, M. C., and de la Garza, J. M. (1990). “Consequential equipment costs associated with lack of availability and downtime.” J. Constr. Eng. Manage., 116(4), 656–669.
Yu, W.-D. (2007). “Hybrid soft computing approach for mining of complex construction databases.” J. Comput. Civ. Eng., 21(5), 343–352.
Zayed, T. M., and Halpin, D. W. (2004). “Process versus data oriented techniques in pile construction productivity assessment.” J. Constr. Eng. Manage., 130(4), 490–499.
Zhang, S., Zhang, C., and Yang, Q. (2003). “Data preparation for data mining.” Applied Artificial Intelligence, 17(5–6), 375–381.
Zhu, Y., Mao, W., and Ahmad, I. (2007). “Capturing implicit structures in unstructured content of construction documents.” J. Comput. Civ. Eng., 21(3), 220–227.
Information & Authors
Information
Published In
Copyright
© 2010 ASCE.
History
Received: Jul 24, 2008
Accepted: Apr 3, 2009
Published online: Apr 30, 2009
Published in print: Jan 2010
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.