Technical Papers
Aug 23, 2021

Clustering-Based Approach for Building Code Computability Analysis

Publication: Journal of Computing in Civil Engineering
Volume 35, Issue 6

Abstract

One common limitation of all automated code compliance-checking methods and tools is their inability to deal with all types of building-code requirements. More research is needed to better identify the different types of requirements, in terms of their syntactic and semantic structures and complexities, to gain more insights about the capabilities and limitations of existing methods and tools (i.e., which requirements they can automatically process, represent, or check, and which not). To address this need, this paper proposes a new set of syntactic and semantic features and complexity and computability metrics for code computability analysis. A clustering-based approach was used to identify the different types of code sentences, in terms of their computability, using the proposed features and metrics. The approach was implemented and tested on a corpus of 6,608 sentences from the International Building Code and its amendments. The sentence clusters and identified sentence types were evaluated using intrinsic and extrinsic evaluation methods. The evaluation results indicated good clustering performance, perfect alignment between the human- and computer-identified types, and good agreement in the assignment of sentences to the types.

Get full access to this article

View all available purchase options and get full access to this article.

Data Availability Statement

Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request (building-code sentence data and computational code for the clustering-based computability analysis).

Acknowledgments

The authors would like to thank the National Science Foundation (NSF). This material is based on work supported by the NSF under Grant No. 1827733. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.

References

AEC3. 2012. “AEC3 require1.” Accessed September 15, 2020. http://www.aec3.eu/require1/AEC3_Require1.html.
Aggarwal, C. C., and C. Zhai. 2012. Mining text data. Berlin: Springer.
Agresti, A. 2003. Categorical data analysis. Hoboken, NJ: Wiley.
Alashwal, A. M., and H. Abdul-Rahman. 2014. “Using PLS-PM to model the process of inter-project learning in construction projects.” Autom. Constr. 44: 176–182. https://doi.org/10.1016/j.autcon.2013.11.010.
Allahyari, M., S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut. 2017. “A brief survey of text mining: Classification, clustering and extraction techniques.” Preprint, submitted July 10, 2017. https://arxiv.org/abs/1707.02919.
Al Qady, M., and A. Kandil. 2014. “Automatic clustering of construction project documents based on textual similarity.” Autom. Constr. 42: 36–49. https://doi.org/10.1016/j.autcon.2014.02.006.
Ambati, B. R., S. Reddy, and M. Steedman. 2016. “Assessing relative sentence complexity using an incremental CCG parser.” In Proc., 2016 Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL): Human Language Technologies, ACL, 1051–1057. Stroudsburg, PA: Association for Computational Linguistics.
Arthur, D., and S. Vassilvitskii. 2006. k-means++: The advantages of careful seeding. Stanford, CA: Stanford Univ.
Bekkerman, R., R. El-Yaniv, N. Tishby, and Y. Winter. 2001. “On feature distributional clustering for text categorization.” In Proc., 24th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 146–153. New York: Association for Computing Machinery.
Bird, S., E. Klein, and E. Loper. 2009. Natural language processing with Python: Analyzing text with the natural language toolkit. Sebastopol, CA: O’Reilly Media.
Cadez, I., D. Heckerman, C. Meek, P. Smyth, and S. White. 2003. “Model-based clustering and visualization of navigation patterns on a web site.” Data Min. Knowl. Discovery 7 (4): 399–424.
Clark, V., and J. Creswell. 2008. The mixed methods readers. Thousand Oaks, CA: SAGE.
Cutting, D. R., D. R. Karger, J. O. Pedersen, and J. W. Tukey. 2017. “Scatter/gather: A cluster-based approach to browsing large document collections.” ACM SIGIR Forum 51 (2): 148–159. https://doi.org/10.1145/3130348.3130362.
Dimyadi, J., and R. Amor. 2013. “Regulatory knowledge representation for automated compliance audit of BIM-based models.” In Proc., 30th CIB W78 Int. Conf., Conseil International du Bâtiment (CIB), 68–78. Rotterdam, Netherlands: Conseil International du Bâtiment.
Dimyadi, J., C. Clifton, M. Spearpoint, and R. Amor. 2016. “Computerizing regulatory knowledge for building engineering design.” J. Comput. Civ. Eng. 30 (5): C4016001. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000572.
Dougherty, J., R. Kohavi, and M. Sahami. 1995. “Supervised and unsupervised discretization of continuous features.” In Proc., 12th Int. Conf., Machine Learning, 194–202. San Francisco: Morgan Kaufmann.
Eastman, C., J. Lee, Y. Jeong, and J. Lee. 2009. “Automatic rule-based checking of building designs.” Autom. Constr. 18 (8): 1011–1033. https://doi.org/10.1016/j.autcon.2009.07.002.
El-Diraby, T. E., and H. Osman. 2011. “A domain ontology for construction concepts in urban infrastructure products.” Autom. Constr. 20 (8): 1120–1132. https://doi.org/10.1016/j.autcon.2011.04.014.
Etikan, I., S. A. Musa, and R. S. Alkassim. 2016. “Comparison of convenience sampling and purposive sampling.” Am. J. Theor. Appl. Stat. 5 (1): 1–4. https://doi.org/10.11648/j.ajtas.20160501.11.
Fodeh, S., B. Punch, and P. N. Tan. 2011. “On ontology-driven document clustering using core semantic features.” Knowl. Inf. Syst. 28 (2): 395–421. https://doi.org/10.1007/s10115-010-0370-4.
Garrett, J. H., and S. J. Fenves. 1987. “A knowledge-based standards processor for structural component design.” Eng. Comput. 2 (4): 219–238. https://doi.org/10.1007/BF01276414.
Garrett, J. H. Jr., and M. M. Hakim. 1992. “Object-oriented model of engineering design standards.” J. Comput. Civ. Eng. 6 (3): 323–347. https://doi.org/10.1061/(ASCE)0887-3801(1992)6:3(323).
Hallgren, K. A. 2012. “Computing inter-rater reliability for observational data: An overview and tutorial.” Tutorials Quant. Methods Psychol. 8 (1): 23. https://doi.org/10.20982/tqmp.08.1.p023.
Hjelseth, E., and N. Nisbet. 2010. “Exploring semantic based model checking.” In Proc., 27th CIB W78 Int. Conf. Rotterdam, Netherlands: Conseil International du Bâtiment.
ICC (International Code Council). 2009. 2009 International building Code. Washington, DC: ICC.
Jin, X. H. 2010. “Neurofuzzy decision support system for efficient risk allocation in public-private partnership infrastructure projects.” J. Comput. Civ. Eng. 24 (6): 525–538. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000058.
Jurafsky, D., and J. Martin. 2014. Speech and language processing. 3rd ed. London: Pearson.
Kaufman, L., and P. J. Rousseeuw. 2009. Finding groups in data: An introduction to cluster analysis. Hoboken, NJ: Wiley.
Ketchen, D. J., and C. L. Shook. 1996. “The application of cluster analysis in strategic management research: An analysis and critique.” Strategies Manage. J. 17 (6): 441–458. https://doi.org/10.1002/(SICI)1097-0266(199606)17:6%3C441::AID-SMJ819%3E3.0.CO;2-G.
Kifokeris, D., and Y. Xenidis. 2017. “Constructability: Outline of past, present, and future research.” J. Constr. Eng. Manage. 143 (8): 04017035. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001331.
Kilicoglu, H., and S. Bergler. 2009. “Syntactic dependency based heuristics for biological event extraction.” In Proc., 2009 Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, 119–127. Stroudsburg, PA: Association for Computational Linguistics.
Li, S., H. Cai, and V. R. Kamat. 2016. “Integrating natural language processing and spatial reasoning for utility compliance checking.” J. Constr. Eng. Manage. 142 (12): 04016074. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001199.
Li, Y., S. M. Chung, and J. D. Holt. 2008. “Text document clustering based on frequent word meaning sequences.” Data Knowl. Eng. 64 (1): 381–404. https://doi.org/10.1016/j.datak.2007.08.001.
Lomakina, L. S., V. B. Rodionov, and A. S. Surkova. 2014. “Hierarchical clustering of text documents.” Autom. Remote Control 75 (7): 1309–1315. https://doi.org/10.1134/S000511791407011X.
Malsane, S., J. Matthews, S. Lockley, P. E. Love, and D. Greenwood. 2015. “Development of an object model for automated compliance checking.” Autom. Constr. 49: 51–58. https://doi.org/10.1016/j.autcon.2014.10.004.
Manning, C. D., H. Schütze, and P. Raghavan. 2008. Introduction to information retrieval. Cambridge, UK: Cambridge University Press.
Manning, C. D., M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky. 2014. “The Stanford CoreNLP natural language processing toolkit.” In Proc., ACL 2014: System Demonstrations, 55–60. Stroudsburg, PA: Association for Computational Linguistics.
Massung, S., C. Zhai, and J. Hockenmaier. 2013. “Structural parse tree features for text representation.” In Proc., 2013 IEEE 7th Int. Conf. on Semantic Computing, 9–16. New York: IEEE.
Naughton, M., N. Stokes, and J. Carthy. 2010. “Sentence-level event classification in unstructured texts.” Inf. Retrieval 13 (2): 132–156. https://doi.org/10.1007/s10791-009-9113-0.
Nawari, N. O. 2019. “A generalized adaptive framework (GAF) for automating code compliance checking.” Buildings 9 (4): 86. https://doi.org/10.3390/buildings9040086.
Ng, H. S., A. Toukourou, and L. Soibelman. 2006. “Knowledge discovery in a facility condition assessment database using text clustering.” J. Infrastruct. Syst. 12 (1): 50–59. https://doi.org/10.1061/(ASCE)1076-0342(2006)12:1(50).
Ozkaya, I., and Ö. Akin. 2006. “Requirement-driven design: Assistance for information traceability in design computing.” Des. Stud. 27 (3): 381–398.
Preidel, C., and A. Borrmann. 2016. “Towards code compliance checking on the basis of a visual programming language.” ITcon 21 (25): 402–421.
Rousseeuw, P. J. 1987. “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.” J. Comput. Appl. Math. 20: 53–65. https://doi.org/10.1016/0377-0427(87)90125-7.
Salama, D. A., and N. M. El-Gohary. 2013. “Automated compliance checking of construction operation plans using a deontology for the construction domain.” J. Comput. Civ. Eng. 27 (6): 681–698. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000298.
Salton, G., and C. Buckley. 1988. “Term-weighting approaches in automatic text retrieval.” Inf. Process. Manage. 24 (5): 513–523.
Särndal, C. E., B. Swensson, and J. Wretman. 2003. Model assisted survey sampling, 100–109. New York: Springer.
Sarstedt, M., and E. Mooi. 2014. A concise guide to market research. Berlin: Springer.
Shepitsen, A., J. Gemmell, B. Mobasher, and R. Burke. 2008. “Personalized recommendation in social tagging systems using hierarchical clustering.” In Proc., 2008 ACM Conf. on Recommender Systems, Association for Computing Machinery (ACM), 259–266. New York: Association for Computing Machinery.
Sneath, P. H., and R. R. Sokal. 1973. Numerical taxonomy. The principles and practice of numerical classification. San Francisco: W.H. Freeman and Company.
Solibri. 2018. “Solibri model checker.” Accessed May 15, 2018. https://www.solibri.com/products/solibri-model-checker.
Solihin, W., and C. Eastman. 2015. “Classification of rules for automated BIM rule checking development.” Autom. Constr. 53: 69–82. https://doi.org/10.1016/j.autcon.2015.03.003.
Štajner, S., and I. Hulpuş. 2018. “Automatic assessment of conceptual text complexity using knowledge graphs.” In Proc., 27th Int. Conf. on Computational Linguistics, ACL, 318–330. Stroudsburg, PA: Association for Computational Linguistics.
Stemler, S. E. 2004. “A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability.” Pract. Assess. Res. Eval. 9 (1): 4.
Valdes-Vasquez, R., and L. E. Klotz. 2013. “Social sustainability considerations during planning and design: Framework of processes for construction projects.” J. Constr. Eng. Manage. 139 (1): 80–89. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000566.
Weise, M., T. Liebich, N. Nisbet, and C. Benghi. 2017. “IFC model checking based on mvdXML 1.1.” In eWork and eBusiness in Architecture, Engineering and Construction: ECPPM 2016, 19–26. Boca Raton, FL: CRC Press.
Yang, L., X. Cai, Y. Zhang, and P. Shi. 2014. “Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization.” Inf. Sci. 260: 37–50. https://doi.org/10.1016/j.ins.2013.11.026.
Yao, J., Q. Mao, S. Goodison, V. Mai, and Y. Sun. 2015. “Feature selection for unsupervised learning through local learning.” Pattern Recognit. Lett. 53: 100–107. https://doi.org/10.1016/j.patrec.2014.11.006.
Yurchyshyna, A., and A. Zarli. 2009. “An ontology-based approach for formalisation and semantic organisation of conformance requirements in construction.” Autom. Constr. 18 (8): 1084–1098. https://doi.org/10.1016/j.autcon.2009.07.008.
Zhai, C., and S. Massung. 2016. Text data management and analysis: A practical introduction to information retrieval and text mining. San Rafael, CA: Association for Computing Machinery and Morgan and Claypool.
Zhang, J., and N. El-Gohary. 2013. “Semantic NLP-based information extraction from construction regulatory documents for automated compliance checking.” J. Comput. Civ. Eng. 30 (2): 04015014. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000346.
Zhang, J., and N. El-Gohary. 2015. “Automated information transformation for automated regulatory compliance checking in construction.” J. Comput. Civ. Eng. 29 (4): B4015001. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000427.
Zhang, J., and N. El-Gohary. 2016. “Extending building information models semiautomatically using semantic natural language processing techniques.” J. Comput. Civ. Eng. 30 (5): C4016004. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000536.
Zhang, J., and N. El-Gohary. 2017a. “Integrating semantic NLP and logic reasoning into a unified system for fully-automated code checking.” Autom. Constr. 73: 45–57. https://doi.org/10.1016/j.autcon.2016.08.027.
Zhang, J., and N. El-Gohary. 2017b. “Semantic-based logic representation and reasoning for automated regulatory compliance checking.” J. Comput. Civ. Eng. 31 (1): 04016037. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000583.
Zhang, R., and N. El-Gohary. 2020. “A machine-learning approach for building-code sentence generation for automatic semantic analysis.” In Proc., Construction Research Congress 2020: Computer Applications (CRC). Reston, VA: ASCE.
Zhong, B., X. Xing, H. Luo, Q. Zhou, H. Li, T. Rose, and W. Fang. 2020. “Deep learning-based extraction of construction procedural constraints from construction regulations.” Adv. Eng. Inf. 43: 101003. https://doi.org/10.1016/j.aei.2019.101003.
Zhou, P., and N. El-Gohary. 2017. “Ontology-based automated information extraction from building energy conservation codes.” Autom. Constr. 74: 103–117. https://doi.org/10.1016/j.autcon.2016.09.004.

Information & Authors

Information

Published In

Go to Journal of Computing in Civil Engineering
Journal of Computing in Civil Engineering
Volume 35Issue 6November 2021

History

Received: Jul 21, 2020
Accepted: Dec 21, 2020
Published online: Aug 23, 2021
Published in print: Nov 1, 2021
Discussion open until: Jan 23, 2022

Permissions

Request permissions for this article.

Authors

Affiliations

Ruichuan Zhang, S.M.ASCE [email protected]
Graduate Student, Dept. of Civil and Environmental Engineering, Univ. of Illinois at Urbana–Champaign, 205 N. Mathews Ave., Urbana, IL 61801. Email: [email protected]
Nora El-Gohary, A.M.ASCE [email protected]
Associate Professor, Dept. of Civil and Environmental Engineering, Univ. of Illinois at Urbana–Champaign, 205 N. Mathews Ave., Urbana, IL 61801 (corresponding author). Email: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

  • Dynamically Identifying and Evaluating Key Barriers to Promoting Prefabricated Buildings: Text Mining Approach, Journal of Construction Engineering and Management, 10.1061/JCEMD4.COENG-13285, 149, 9, (2023).
  • Capabilities of rule representations for automated compliance checking in healthcare buildings, Automation in Construction, 10.1016/j.autcon.2022.104688, 146, (104688), (2023).
  • Neural Semantic Parsing of Building Regulations for Compliance Checking, IOP Conference Series: Earth and Environmental Science, 10.1088/1755-1315/1101/9/092022, 1101, 9, (092022), (2022).
  • Hierarchical Representation and Deep Learning–Based Method for Automatically Transforming Textual Building Codes into Semantic Computable Requirements, Journal of Computing in Civil Engineering, 10.1061/(ASCE)CP.1943-5487.0001014, 36, 5, (2022).

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share