Mining and Visualizing Cost and Schedule Risks from News Articles with NLP and Network Analysis
Publication: Construction Research Congress 2022
ABSTRACT
Cost overruns and schedule delays in US transit projects have been of growing concern for years. Nevertheless, the data availability and sample size have restricted quantitative analysis toward investigating the risks leading to overruns. Innovative data sources and collection methods need to be identified in addition to traditional surveys and case studies. News articles report on issues and risk events leading to overruns as projects progress but have not yet been explored in the construction domain. The difficulty lies in data compilation and analysis. To fill this gap, the paper tested combinations of different natural language processing (NLP) and machine learning methods to automatically identify risk narratives from news articles. The risk sentences are classified into 5 categories and 26 subcategories through a content analysis approach. Then the risks are ranked and analyzed using an appropriate co-occurrence network. The research demonstrates the possibility of integrating NLP and network analysis for exploring publicly available textual documents to explain project performance issues. The approach serves as a baseline for future studies to develop more intelligent models to examine a wide range of media data and other textual reports in the construction domain.
Get full access to this article
View all available purchase options and get full access to this chapter.
REFERENCES
Bhadani, S., Verma, I., and Dey, L. (2019). Mining financial risk events from news and assessing their impact on stocks. In Workshop on Mining Data for Financial Applications (pp. 85–100). Springer, Cham.
Boulis, C., and Ostendorf, M. (2005). Text classification by augmenting the bag-of-words representation with redundancy-compensated bigrams. In Proc. of the International Workshop in Feature Selection in Data Mining (pp. 9–16). Citeseer.
Carrillo, P., Harding, J., and Choudhary, A. (2011). Knowledge discovery from post-project reviews. Construction Management and Economics, 29(7), 713–723. doi:https://doi.org/10.1080/01446193.2011.588953.
Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding.
Flyvbjerg, B., Ansar, A., Budzier, A., Buhl, S., Cantarelli, C., Garbuio, M., Glenting, C., Holm, M. S., Lovallo, D., Lunn, D., and Molin, E. (2018). Five things you should know about cost overrun. Transportation Research Part A: Policy and Practice, 118, 174–190.
Gao, N., and Touran, A. (2020). Cost Overruns and Formal Risk Assessment Program in US Rail Transit Projects. Journal of Construction Engineering and Management, 146(5), 05020004.
Gao, N., Wang, Q., and Touran, A. (2021). Labeled risk and non-risk sentences from news articles. Available online: https://github.com/gaonancy/News_Dataset.
Ghosh, S., and Jintanapakanont, J. (2004). Identifying and assessing the critical risk factors in an underground rail project in Thailand: a factor analysis approach. International journal of Project management, 22(8), 633–643.
Hassan, F. U., and Le, T. (2020). Automated Requirements Identification from Construction Contract Documents Using Natural Language Processing. Journal of Legal Affairs and Dispute Resolution in Engineering and Construction, 12(2), 04520009.
Jallan, Y., and Ashuri, B. (2020). Text Mining of the Securities and Exchange Commission Financial Filings of Publicly Traded Construction Firms Using Deep Learning to Identify and Assess Risk. Journal of Construction Engineering and Management, 146(12), 04020137.
Kuo, Y.-C., and Lu, S.-T. (2013). Using fuzzy multiple criteria decision making approach to enhance risk assessment for metropolitan construction projects. International journal of Project management, 31(4), 602–614.
Lee, J., Yi, J.-S., and Son, J. (2019). Development of automatic-extraction model of poisonous clauses in international construction contracts using rule-based NLP. Journal of Computing in Civil Engineering, 33(3), 04019003.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality.
Piskorski, J., and Jacquet, G. (2020). TF-IDF Character N-grams versus Word Embedding-based Models for Fine-grained Event Classification: A Preliminary Study. In Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020 (pp. 26–34).
Salama, D. M., and El-Gohary, N. M. (2016). Semantic text classification for supporting automated compliance checking in construction. Journal of Computing in Civil Engineering, 30(1), 04014106.
Voulgaris, C. T. (2017). Crystal Balls and Black Boxes: Optimism Bias in Ridership and Cost Forecasts for New Starts Rapid Transit Projects. UCLA.
Zhang, J., Zi, L., Hou, Y., Deng, D., Jiang, W., and Wang, M. (2020). A C-BiLSTM Approach to Classify Construction Accident Reports. Applied Sciences, 10(17), 5754. https://doi.org/10.3390/app10175754.
Information & Authors
Information
Published In
History
Published online: Mar 7, 2022
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.