Technical Papers
Nov 8, 2016

Analysis of the Influences of Sampling Bias and Class Imbalance on Performances of Probabilistic Liquefaction Models

Publication: International Journal of Geomechanics
Volume 17, Issue 6

Abstract

Sampling bias and class imbalance are important parts of model uncertainty that have a significant impact on the predictive probability of classification models. This study analyzed the influences of sampling bias and class imbalance on the performance of four common methods used in 10 models for seismic liquefaction—Bayesian network (BN), artificial neural network (ANN), logistic regression (LR), and support vector machine (SVM)—using controlled experiments based on penetration test (SPT) data from 350 standard case histories. The data are divided into two data sets with class distributions of 150:150 and 200:100, which are separately stratified and sampled to obtain 11 different cases of distributions (10:90, 20:80, 25:75, 33:67, 40:60, 50:50, 60:40, 67:33, 75:25, 80:20, and 90:10) to quantify the predictive performance of the four models using statistical model validation metrics, such as overall accuracy, area under the receiver operating characteristic curve, precision, recall, and F-score. The experiments show that the best distribution of liquefaction samples for training is not a fixed point but, rather, a range. The authors suggest that the best range of sample distribution is from 1 to 1.5 (liquefaction/nonliquefaction) for the BN method, from 0.67 to 1 for the ANN method, approximately 0.5 for the LR model, and from 0.5 to 1 for the SVM method. Furthermore, oversampling technology was used to try to improve the predictive capability of the four models for two samples (10:90 and 90:10) with bad class imbalance and sampling bias. The predictive performance of the oversampled sample considerably improved over the original samples with bad class imbalance and sampling bias for the LR model and the SVM polynomial (SVM-Pol) model rather than for the BN maximum likelihood estimation (BN-MLE) model and the ANN radial basis function (ANN-RBF) model. In addition, in the fields with unknown real distribution of classes in the population, when a training sample contains severe class imbalance or sampling bias, the authors recommend that researchers choose an oversampled sample that has the same class distribution as the population of the collected data to ensure optimal performance.

Get full access to this article

View all available purchase options and get full access to this article.

Acknowledgments

The work presented in this paper is part of research sponsored by the National Science Council of People’s Republic of China under Grant 2011CB013605-2.

References

Bayraktarli, Y. Y. (2006). “Application of Bayesian probabilistic networks for liquefaction of soil.” 6th Int. Ph.D. Symp. in Civil Engineering, Institute of Structural Engineering ETH Zurich, Zurich, Switzerland, 8, 23–26.
Bensi, M., Kiureghian, A. D., and Straub, D. (2011). “Bayesian network modeling of correlated random variables drawn from a Gaussian random field.” Struct. Saf., 33(6), 317–332.
Cetin, K. O., Kiureghian, A. D., and Seed, R. B. (2002). “Probabilistic models for the initiation of seismic soil liquefaction.” Struct. Saf., 24(1), 67–82.
Chen, Y. R., Hsieh, S. C., Chen, J. W., and Shih, C. C. (2005). “Energy-based probabilistic evaluation of soil liquefaction.” Soil Dyn. Earthquake Eng., 25(1), 55–68.
Goh, A. T. C. (1996). “Neural-network modeling of CPT seismic liquefaction data.” J. Geotech. Eng., 70–73.
Goh, A. T. C., and Goh, S. H. (2007). “Support vector machines: Their use in geotechnical engineering as illustrated using seismic liquefaction data.” Comput. Geotech., 34(5), 410–421.
Hu, J., Tang, X. W., and Qiu, J. (2015). “A Bayesian network approach for predicting seismic liquefaction based on interpretive structural modeling.” Georisk, 9(3), 200–217.
Huang, H. W., Zhang, J., and Zhang, L. M. (2012). “Bayesian network for characterizing model uncertainty of liquefactionpotential evaluation models.” KSCE J. Civ. Eng., 16(5), 714–722.
Idriss, I. M., and Boulanger R. W. (2010). “SPT-based liquefaction triggering procedures.” Rep. UCD/CGM-10/02, Center for Geotechnical Modeling, Dept. of Civil and Environmental Engineering, Univ. of California, Davis, CA.
Jain, A. (2012). Sampling bias in evaluating the probability of seismically induced soil liquefaction with SPT & CPT case histories, Masters dissertation, Michigan Technological Univ., Houghton, MI.
Juang, C. H., and Chen, C. J. (1999). “CPT-based liquefaction evaluation using artificial neural networks.” Comput.-Aided Civ. Infrastruct. Eng., 14(3), 221–229.
Juang, C. H., Ching, J., Luo, Z., and Ku, C. S. (2012). “New models for probability of liquefaction using standard penetration tests based on updated database of case histories.” Eng. Geol., 133–134, 85–93.
Lai, S., Chang, W., and Lin, P. (2006). “Logistic regression model for evaluating soil liquefaction probability using CPT data.” J. Geotech. Geoenviron. Eng., 694–704.
Liao Samson, S. C., Veneziano, D., and Whitman, R. V. (1988). “Regression models for evaluating liquefaction probability.” J. Geotech. Engrg., 389–411.
Moss, R., Seed, R., Kayen, R., Stewart, J., Der Kiureghian, A., and Cetin, K. (2006). “CPT-based probabilistic and deterministic assessment of in situ seismic soil liquefaction potential.” J. Geotech. Geoenviron. Eng., 1032–1051.
Olson, D. L., and Delen, D. (2008). Advanced data mining techniques, 1st Ed., Springer, Berlin, 111–123.
Oommen, T., Baise, L. G., and Vogel, R. (2010). “Validation and application of empirical liquefaction models.” J. Geotech. Geoenviron. Eng., 1618–1633.
Oommen, T., Baise, L. G., and Vogel, R. (2011). “Sampling bias and class imbalance in maximum-likelihood logistic regression.” Math. Geosci., 43(1), 99–120.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems, Morgan Kaufmann, San Mateo, CA.
Thammasiri, D., Delen, D., Meesad, P., and Kasap, N. (2013). “A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition.” Expert Syst. Appl., 41(2), 321–330.
Vapnik, V. (1995). The nature of statistical learning theory, Springer, New York.
Yazdi, J. S., Kalantary, F., and Yazdi, H. S. (2013). “Investigation on the effect of data imbalance on prediction of liquefaction.” Int. J. Geomech., 463–466.
Yen, S. J., and Lee, Y. S. (2009). “Cluster-based under-sampling approaches for imbalanced data distributions.” Expert Syst. Appl., 36(3), 5718–5727.

Information & Authors

Information

Published In

Go to International Journal of Geomechanics
International Journal of Geomechanics
Volume 17Issue 6June 2017

History

Received: Dec 3, 2015
Accepted: Aug 22, 2016
Published online: Nov 8, 2016
Discussion open until: Apr 8, 2017
Published in print: Jun 1, 2017

Permissions

Request permissions for this article.

Authors

Affiliations

Ji-Lei Hu
Ph.D. Student, Institute of Geotechnical Engineering, Dalian Univ. of Technology, Dalian 116024, China.
Xiao-Wei Tang
Assistant Professor, Institute of Geotechnical Engineering and State Key Laboratory of Coastal and Offshore Engineering, Dalian Univ. of Technology, Dalian 116024, China.
Jiang-Nan Qiu [email protected]
Professor, School of Management Science and Engineering, Dalian Univ. of Technology, Dalian 116024, China (corresponding author). E-mail: [email protected]

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share