Technical Papers
Jun 6, 2023

Cost-Weighted TF-IDF: A Novel Approach for Measuring Highway Project Similarity Based on Pay Items’ Cost Composition and Term Frequency

Publication: Journal of Construction Engineering and Management
Volume 149, Issue 8

Abstract

State highway agencies (SHAs) often need to cluster or bundle projects in accordance with their scope similarity for various construction management tasks, including historical data-driven time, cost estimating, and project bundling. Conventionally, SHAs categorize similar projects into work types based on subjective judgment about the similarity between major pay items. A few quantitative methods for project similarity determination are found in the literature, but they mostly use one single source of information, either the cost contribution of pay items or the keywords of pay items descriptions, for measuring project similarity. This paper presents the first attempt to integrate multiple information sources for project similarity measurement. This research proposes a novel cost-weighted term frequency-inverse document frequency (CW-TF-IDF) method that incorporates the cost information of pay items into the traditional TF-IDF word embedding method to measure project similarity. The effectiveness of the proposed method in supporting project clustering and bundling was tested using the historical bid data collected from an SHA. The findings showed that the CW-TF-IDF method significantly improves project clustering performance compared to the most recent state-of-the-art method. The CW-TF-IDF method also showed its outperformance in project bundling as it yielded a cosine similarity of over 0.9 for most of the bundled projects in the testing data. This proposed method is expected to help SHAs accurately identify similar projects and eventually improve their project management effectiveness.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

The data used during the study were provided by a third party. Direct requests for these materials may be made to the provider as indicated in the Acknowledgments. Some models or codes that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to acknowledge that the Iowa Department of Transportation provided the bid tabulation data for this study.

References

AASHTO. 2013. Practical guide to cost estimating. Washington, DC: AASHTO.
AASHTO. 2019. AASHTOWare project estimation user guide. Washington, DC: AASHTO.
AASHTO COC (Committee on Construction). 2018. “AASHTO COC contract administration section questionnaire regarding bundling.” Accessed July 29, 2022. https://construction.transportation.org/wp-content/uploads/sites/20/2018/05/Summary-of-responses-AASHTO-COC-CA-Questionnaire-Regarding-Bundling-03-22-2018.pdf.
Abdi, H., and L. J. Williams. 2010. “Principal component analysis.” Wiley Interdiscip. Rev. Comput. Stat. 2 (4): 433–459. https://doi.org/10.1002/wics.101.
Akuma, S., T. Lubem, and I. T. Adom. 2022. “Comparing Bag of Words and TF-IDF with different models for hate speech detection from live tweets.” Int. J. Inform. Technol. 14 (7): 3629–3635. https://doi.org/10.1007/s41870-022-01096-4.
Alikhani, H., and H. D. Jeong. 2021. “Highway project clustering using unsupervised machine learning approach.” In Proc., Computing in Civil Engineering 2021—Selected Papers from the ASCE Int. Conf. on Computing in Civil Engineering 2021, 172–179. Reston, VA: ASCE.
Behera, S. K., and R. Dash. 2021. “Performance of ELM using max-min document frequency-based feature selection in multilabeled text classification.” In Intelligent and cloud computing, 425–433. New York: Springer.
Bi, Y., Q. Yang, and P. Tian. 2021. “Clustering multi-project based on activities sequence and knowledge investments similarity.” In Proc., 2021 IEEE 2nd Int. Conf. on Information Technology, Big Data and Artificial Intelligence (ICIBA), 652–656. New York: IEEE.
Birunda, S., and D. Kanniga. 2021. “A review on word embedding techniques for text classification.” In Proc., 2020 Int. Conf. on Innovative Data Communication Technologies and Application (ICIDCA), 267–281. Singapore: Springer.
CTDOT (Connecticut DOT). 2019. Connecticut department of transportation 2019 estimating guidelines. New Haven, CT: CTCOT.
FAST (Fixing America’s Surface Transportation). 2015. “Fixing America’s surface transportation act.” Accessed July 29, 2022. http://uscode.house.gov/view.xhtml?req=granuleid:USC-prelim-title23-section144&num=0&edition=prelim.
FHWA (Federal Highway Administration). 2021. Advanced project bundling: A reference for getting started. Washington, DC: FHWA.
FHWA (Federal Highway Administration). 2022. Project bundling: Factsheet. Washington, DC: FHWA.
Gao, J., Y. He, X. Zhang, and Y. Xia. 2018. “Duplicate short text detection based on Word2vec.” In Proc., IEEE Int. Conf. on Software Engineering and Service Sciences, ICSESS, 33–37. New York: IEEE.
Gebre, B. G., M. Zampieri, P. Wittenburg, and T. Heskes. 2013. “Improving native language identification with TF-IDF weighting.” In Proc., 8th Workshop on Innovative Use of NLP for Building Educational Applications, 216–223. Stroudsburg, PA: Association for Computational Linguistics.
Goh, Y. M., and C. U. Ubeynarayana. 2017. “Construction accident narrative classification: An evaluation of text mining techniques.” Accid. Anal. Prev. 108 (Nov): 122–130. https://doi.org/10.1016/j.aap.2017.08.026.
Hassan, F. U., T. Le, and X. Lv. 2021. “Addressing legal and contractual matters in construction using natural language processing: A critical review.” J. Constr. Eng. Manage. 147 (9): 03121004. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002122.
Hyung, W. G., S. Kim, and J. K. Jo. 2020. “Improved similarity measure in case-based reasoning: A case study of construction cost estimation.” Eng. Constr. Archit. Manage. 27 (2): 561–578. https://doi.org/10.1108/ECAM-01-2019-0035.
INDOT (Indiana DOT). 2018. Engineering assessment manual. Indianapolis: INDOT.
Irfan, M., M. B. Khurshid, P. Anastasopoulos, S. Labi, and F. Moavenzadeh Fred. 2011. “Planning-stage estimation of highway project duration on the basis of anticipated project cost, project type, and contract type.” Int. J. Project Manage. 29 (1): 78–92. https://doi.org/10.1016/j.ijproman.2010.01.001.
Jafarzadegan, M., F. Safi-Esfahani, and Z. Beheshti. 2019. “Combining hierarchical clustering approaches using the PCA method.” Expert Syst. Appl. 137 (Dec): 1–10. https://doi.org/10.1016/j.eswa.2019.06.064.
Jeong, H. D., C. Le, and V. Devaguptapu. 2019. Effective production rate estimation using construction daily work report data. Helena, MT: Montana DOT.
Le, C., H. D. Jeong, and I. Damnjanovic. 2021. “Network theory—Driven construction logic knowledge network: Process modeling and application in highway projects.” J. Constr. Eng. Manage. 147 (10): 04021114. https://doi.org/10.1061/(ASCE)CO.1943-7862.0002143.
Li, L., W. Fan, D. Huang, Y. Dang, and J. Sun. 2012. “Boosting performance of gene mention tagging system by hybrid methods.” J. Biomed. Inf. 45 (1): 156–164. https://doi.org/10.1016/j.jbi.2011.10.004.
Li, X., C. Wu, F. Xue, Z. Yang, J. Lou, and W. Lu. 2022. “Ontology-based mapping approach for automatic work packaging in modular construction.” Autom. Constr. 134 (Feb): 104083. https://doi.org/10.1016/j.autcon.2021.104083.
Lilleberg, J., Y. Zhu, and Y. Zhang. 2015. “Support vector machines and Word2vec for text classification with semantic features.” In Proc., 2015 IEEE 14th Int. Conf. on Cognitive Informatics & Cognitive Computing (ICCI*CC), 136–140. New York: IEEE.
Manning, C., and H. Schutze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
MDOT (Michigan DOT). 2021. Michigan department of transportation work zone safety and mobility manual. Southfield, MI: MDOT.
MDT (Montana DOT). 2016. Cost estimation procedure for highway design projects. Helena, MT: Montana ODT.
Mikolov, T., I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. “Distributed representations of words and phrases and their compositionality.” In Proc., 26th Int. Conf. on Advanced Neural Information Processing Systems, 3111–3119. Red Hook, NY: Curran Associates.
Namratha, M., and T. R. Prajwala. 2012. “A comprehensive overview of clustering algorithms in pattern recognition.” IOSR J. Comput. Eng. 4 (6): 23–30. https://doi.org/10.9790/0661-0462330.
Nguyen, P. H. D., S. M. Asce, D. Tran, M. Asce, and B. C. Lines. 2020. “Fuzzy set theory approach to classify highway project characteristics for delivery selection.” J. Constr. Eng. Manage. 146 (5): 04020044. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001829.
Okere, G. 2018. “Evaluating the allocation of contingency on state DOT projects based on project types and rate of cost overruns.” Asian J. Civ. Eng. 19 (4): 463–472. https://doi.org/10.1007/s42107-018-0042-3.
Okere, G. 2020. A construction project classification framework: Mapping the dimensions for classification of pacific northwest highway project types. Seattle: Pacific Northwest Transportation Consortium.
Patel, S., and S. Sihmar. 2015. “A study of hierarchical clustering algorithms.” In Proc., 2015 2nd Int. Conf. on Computing for Sustainable Global Development (INDIACom). New York: IEEE.
PennDOT (Pennsylvania DOT). 2018. Estimating manual. Harrisburg, PA: PennDOT.
Pennington, J., R. Socher, and C. D. Manning. 2014. “GloVe: Global vectors for word representation.” In Proc, the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. Stroudsburg, PA: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.
Pinter, Y., R. Guthrie, and J. Eisenstein. 2017. “Mimicking word embeddings using subword RNNs.” In Proc., EMNLP 2017—Conf. on Empirical Methods in Natural Language Processing, 102–112. Cedarville, OH: Association for Computational Linguistics. https://doi.org/10.48550/arxiv.1707.06961.
Qiao, Y., J. D. Fricker, and S. Labi. 2019a. “Effects of bundling policy on project cost under market uncertainty: A comparison across different highway project types.” Transp. Res. Part A Policy Pract. 130 (Dec): 606–625. https://doi.org/10.1016/j.tra.2019.10.001.
Qiao, Y., J. D. Fricker, and S. Labi. 2019b. “Influence of project bundling on maintenance of traffic costs across highway project types.” J. Constr. Eng. Manage. 145 (8): 05019010. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001676.
Qiao, Y., J. D. Fricker, and S. Labi. 2019c. “Quantifying the similarity between different project types based on their pay item compositions: Application to bundling.” J. Constr. Eng. Manage. 145 (9): 04019053. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001689.
Qiao, Y., S. Labi, and J. D. Fricker. 2021. “Does highway project bundling policy affect bidding competition? Insights from a mixed ordinal logistic model.” Transp. Res. Part A Policy Pract. 145 (Mar): 228–242. https://doi.org/10.1016/j.tra.2021.01.006.
Rosenberg, A., and J. Hirschberg. 2007. “V-measure: A conditional entropy-based external cluster evaluation measure.” In Proc., 2007 Joint Conf. on Empirical Methods in natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 410–420. Stroudsburg, PA: Association for Computational Linguistics.
Runeson, P., M. Alexandersson, and O. Nyholm. 2007. “Detection of duplicate defect reports using natural language processing.” In Proc., 29th Int. Conf. on Software Engineering (ICSE’07), 499–510. New York: IEEE.
Sabharwal, N., and A. Agrawal. 2021. “Neural networks for natural language processing.” In Hands-on question answering systems with BERT, 15–39. New York: Springer.
Salama, D. M., and N. M. El-Gohary. 2016. “Semantic text classification for supporting automated compliance checking in construction.” J. Comput. Civ. Eng. 30 (1): 04014106. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000301.
Saxena, A., M. Prasad, A. Gupta, N. Bharill, O. P. Patel, A. Tiwari, M. J. Er, W. Ding, and C. T. Lin. 2017. “A review of clustering techniques and developments.” Neurocomputing 267 (Dec): 664–681. https://doi.org/10.1016/j.neucom.2017.06.053.
Sebastiani, F. 2002. “Machine learning in automated text categorization.” ACM Comput. Surv. 34 (1): 1–47. https://doi.org/10.1145/505282.505283.
Shrestha, S., Y. Shan, and P. M. Goodrum. 2022. “Mapping of state transportation agencies’ practices and perceptions in project bundling.” Transp. Res. Rec. 2676 (7): 597–607. https://doi.org/10.1177/03611981221080129.
Singh, A. K., and M. Shashi. 2019. “Vectorization of text documents for identifying unifiable news articles.” Int. J. Adv. Comput. Sci. Appl. 10 (7): 305–310. https://doi.org/10.14569/IJACSA.2019.0100742.
Sisodia, D. S., and A. Verma. 2018. “Performance of unsupervised learning algorithms for online document clustering.” In Proc., 2018 Int. Conf. on Inventive Research in Computing Applications (ICIRCA), 920–925. New York: IEEE.
Taylor, T. R. B., R. E. Sturgill Jr., and Y. Li. 2017. Practices for establishing contract completion dates for highway projects. Washington, DC: National Academies Press.
Tian, D., M. Li, J. Shi, Y. Shen, and S. Han. 2021. “On-site text classification and knowledge mining for large-scale projects construction by integrated intelligent approach.” Adv. Eng. Inf. 49 (Aug): 101355. https://doi.org/10.1016/j.aei.2021.101355.
Torkanfar, N., and E. Rezazadeh Azar. 2020. “Quantitative similarity assessment of construction projects using WBS-based metrics.” Adv. Eng. Inf. 46 (Oct): 101179. https://doi.org/10.1016/j.aei.2020.101179.
TRB (Transportation Research Board). 2006. Guidance for estimation and management for highway projects during planning, programming, and preconstruction. Washington, DC: TRB.
TxDOT (Texas DOT). n.d. Risk-based construction cost estimating–Reference guide. Austin, TX: TxDOT.
Williams, T. P., and J. Gong. 2014. “Predicting construction cost overruns using text mining, numerical data and ensemble classifiers.” Autom. Constr. 43 (Jul): 23–29. https://doi.org/10.1016/j.autcon.2014.02.014.
WSDOT (Washington State DOT). 2018. WSDOT project risk management guide. Olympia, WA: WSDOT.
WSDOT (Washington State DOT). 2020. Cost estimating manual for projects. Olympia, WA: WSDOT.
Zhang, F., H. Fleyeh, X. Wang, and M. Lu. 2019. “Construction site accident analysis using text mining and natural language processing techniques.” Autom. Constr. 99 (Mar): 238–248. https://doi.org/10.1016/j.autcon.2018.12.016.

Information & Authors

Information

Published In

Go to Journal of Construction Engineering and Management
Journal of Construction Engineering and Management
Volume 149Issue 8August 2023

History

Received: Aug 13, 2022
Accepted: Apr 3, 2023
Published online: Jun 6, 2023
Published in print: Aug 1, 2023
Discussion open until: Nov 6, 2023

Permissions

Request permissions for this article.

ASCE Technical Topics:

Authors

Affiliations

Ph.D. Student, Glenn Dept. of Civil Engineering, Clemson Univ., Clemson, SC 29634. ORCID: https://orcid.org/0000-0001-6361-2985. Email: [email protected]
Muhammad Ali Moriyani, S.M.ASCE [email protected]
Ph.D. Student, Dept. of Civil, Construction, and Environmental Engineering, North Dakota State Univ., Fargo, ND 58102. Email: [email protected]
Assistant Professor, Dept. of Civil, Construction, and Environmental Engineering, North Dakota State Univ., Fargo, ND 58102 (corresponding author). ORCID: https://orcid.org/0000-0002-2582-2671. Email: [email protected]
Assistant Professor, Glenn Dept. of Civil Engineering, Clemson Univ., Clemson, SC 29634. ORCID: https://orcid.org/0000-0002-8606-9214. Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share