TECHNICAL PAPERS
Dec 15, 2011

Variable Selection Using the Gamma Test Forward and Backward Selections

Publication: Journal of Hydrologic Engineering
Volume 17, Issue 1

Abstract

Variable selection is a process of reducing data dimensionality by eliminating irrelevant or redundant variables. This is an essential step in any application that involves a large number of input variables and is a challenging task for model developers. Enumerative variable selection is usually impractical for large numbers of variables, hence efficient and effective variable selection tools are needed for real world problems. This paper describes a novel application of the gamma test with both forward and backward selection methods. These methods are tested with a flood regionalization problem. It has been found that the gamma test is able to provide useful input variable combinations and its best result outperforms the conventional forward cross-validation method. However, the gamma test results still have undesirable uncertainties. This is reflected by the difference between the gamma test backward selection and forward selection, as well as the uncertain Gamma statistic. More explorations are needed to improve or falsify the gamma test.

Get full access to this article

View all available purchase options and get full access to this article.

Acknowledgments

The first writer would like to thank the University of Malaya and Government of Malaysia for the scholarship offered for this study. The writers are grateful to the anonymous reviewers for their comments that helped to improve the manuscript.

References

Ahmadi, A., Han, D., Karamouz, M., and Remesan, R. (2009). “Input data selection for solar radiation estimation.” Hydrol. Processes, 23(19), 2754–2764.
Allen, D. M. (1974). “The relationship between variable selection and data augmentation and a method for prediction.” Technometrics, 16(1), 125–127.
Arlot, S., and Celisse, A. (2010). “A survey of cross-validation procedures for model selection.” Statist. Surv., 4, 40–79.
Bartlett, P. L., Boucheron, S., and Lugosi, G. (2002). “Model selection and error estimation.” Mach. Learn., 48(1–3), 85–113.
Bray, M., and Han, D. (2004). “Identification of support vector machines for runoff modelling.” J. Hydroinf., 6(4), 265–280.
Centre for Ecology and Hydrology (CEH). (1999). Flood estimation handbook, Vol. 5, Lancaster, UK.
Chen, S., Hong, X., Harris, C. J., and Sharkey, P. M. (2004). “Sparse modeling using orthogonal forward regression with PRESS statistic and regularization.” IEEE Trans. Syst. Man Cyber.—Part B: Cybernetics, 34(2), 898–911.
Craven, P., and Wahba, G. (1979). “Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation.” Numer. Math., 31(4), 377–403.
Dalrymple, T. (1960). “Flood frequency analysis.” Wat. Supp. Paper 1543-A, USGS, Washington, DC.
Devroye, L., and Wagner, T. J. (1979). “Distribution-free performance bounds for potential function rules.” IEEE Trans. Info. Theo., 25(5), 601–604.
Durrant, P. J. (2001). “WinGammaTM: A non-linear data analysis and modelling tool with applications to flood prediction.” Ph.D. thesis, Dept. of Computer Science, Cardif Univ., Wales, UK.
Evans, D., and Jones, A. J. (2002). “A proof of the Gamma test.” Proc. R. Soc. London, Ser. A, 458(2027), 2759–2799.
Geisser, S. (1975). “The predictive sample reuse method with applications.” J. Am. Stat. Assoc., 70(350), 320–328.
Guyon, I., and Elisseeff, A. (2003). “An introduction to variable and feature selection.” J. Mach. Learn. Res., 3, 1157–1182.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002). “Gene selection for cancer classification using support vector machines.” Mach. Learn., 46(1–3), 389–422.
Han, D., Chan, L., and Zhu, N. (2007a). “Flood forecasting using support vector machines.” J. Hydroinf., 9(4), 267–276.
Han, D., Cluckie, I. D., Karbassioun, D., Lawry, J., and Krauskopf, B. (2002). “River flow modelling using fuzzy decision trees.” Water Resour. Manage., 16(6), 431–445.
Han, D., Kwong, T., and Li, S. (2007b). “Uncertainties in real-time flood forecasting with neural networks.” Hydrol. Processes, 21(2), 223–228.
Han, D., and Yan, W. (2009). “Validation of the gamma test for model input selection—With a case study in evaporation estimation.” Proc., 2009 Fifth Int. Conf. on Natural Computation, Vol. 2, IEEE Computer Society, Washington, DC.
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction, Springer-Verlag, New York.
Hawkins, D. M., Basak, S. C., and Mills, D. (2003). “Assessing model fit by cross validation.” J. Chem. Info. Comp. Sci., 43(2), 579–586.
Khan, J. A., Aelst, S. V., and Zamar, R. H. (2007). “Building a robust linear model with forward selection and stepwise procedures.” Comp. Stat. Data Analy., 52(1), 239–248.
Kira, K., and Rendell, L. A. (1992). “The feature selection problem: Traditional methods and a new algorithm.” Proc., Tenth National Conf. on AI, MIT Press, Cambridge, MA, 129–134.
Kohavi, R., and John, G. H. (1997). “Wrappers for feature subset selection.” Artif. Intell., 97(1–2), 273–324.
Končar, N. (1997). “Optimisation methodologies for direct inverse neurocontrol.” Ph.D. thesis, Dept. of Computing, Imperial College of Science, Technology and Medicine, Univ. of London, UK.
Lin, B., Syed, M., and Falconer, R. A. (2008). “Predicting faecal indicator levels in estuarine receiving waters—An integrated hydrodynamic and ANN modelling approach.” Environ. Modell. Software, 23(6), 729–740.
Liu, H., and Motoda, H. (1998). Feature selection for knowledge discovery and data mining, Kluwer Academic, Boston.
Liu, H., and Motoda, H. (2007). Computational methods of feature selection, Chapman and Hall/CRC, Boca Raton, FL.
Liu, H., Motoda, H., and Yu, L. (2002). “Feature selection with selective sampling.” Proc., Nineteenth Int. Conf. on Machine Learning, Morgan Kaufmann, San Francisco, 395–402.
Mao, K. Z. (2004). “Orthogonal forward selection and backward elimination algorithms for feature subset selection.” IEEE Trans. Syst. Man Cyber.—Part B: Cybernetics, 34(1), 629–634.
Masaeli, M., Fung, G., and Dy, J. G. (2010). “From transformation-based dimensionality reduction to feature selection.” Proc., 27th Int. Conf. on Machine Learning, International Machine Learning Society, Princeton, NJ.
Miller, A. (2002). Subset selection in regression, Chapman and Hall/CRC, Boca Raton, FL.
Mitra, P., Murthy, C. A., and Pal, S. K. (2002). “Unsupervised feature selection using feature similarity.” IEEE Trans. Pattern Anal. Mach. Intell., 24(3), 301–312.
Moghaddamnia, A., Gousheh, M. G., Piri, J., Amin, S., and Han, D. (2009a). “Evaporation estimation using artificial neural networks and adaptive neuro-fuzzy inference system techniques.” Adv. Water Resour., 32(1), 88–97.
Moghaddamnia, A., Remesan, R., Kashani, M. H., Mohammadi, M., Han, D., and Piri, J. (2009b). “Comparison of LLR, MLP, Elman, NNARX and ANFIS Models—With a case study in solar radiation estimation.” J. Atm., Solar-Terrest. Phys., 71(8–9), 975–982.
Myung, I. J. (2000). “The importance of complexity in model selection.” J. Math. Psychol., 44(1), 190–204.
Nilson, R., Peña, J. M., Björkegren, J., and Tegnér, J. (2007). “Consistent feature selection for pattern recognition in polynomial time.” J. Mach. Learn. Res., 8, 589–612.
Noori, R., Hoshyaripour, G., Ashrafi, K., and Araabi, B. N. (2010a). “Uncertainty analysis of developed ANN and ANFIS models in prediction of carbon monoxide daily concentration.” Atmos. Environ., 44(4), 476–482.
Noori, R., Karbassi, A., and Sabahi, M. S. (2010b). “Evaluation of PCA and Gamma test techniques on ANN operation for weekly solid waste prediction.” J. Environ. Plann. Manage., 91(3), 767–771.
Picard, R. R., and Cook, R. D. (1984). “Cross validation of regression models.” J. Am. Stat. Assoc., 79(387), 575–583.
Piri, J., Amin, S., Moghaddamnia, A., Keshavarz, A., Han, D., and Remesan, R. (2009). “Daily pan evaporation modelling in a hot and dry climate.” J. Hydrol. Eng., 14(8), 803–811.
Remesan, R., Shamim, M. A., and Han, D. (2008). “Model data selection using gamma test for daily solar radiation estimation.” Hydrol. Processes, 22(21), 4301–4309.
Remesan, R., Shamim, M. A., Han, D., and Mathew, J. (2009). “Runoff prediction using an integrated hybrid modelling scheme.” J. Hydrol. (Amsterdam), 372(1–4), 48–60.
Rudemo, M. (1982). “Empirical choice of histograms and kernel density estimators.” Scand. J. Stat. Theory Appl., 9(2), 65–78.
Sanchez-Marono, N., Alonso-Betanzos, A., and Tombilla-Sanroman, M. (2007). Filter methods for feature selection—A comparative study, IDEAL 2007, LNCS 4881, H. Yin et al., eds., Heidelberg, Berlin, 178–187.
Song, L., Smola, A., Gretton, A., Borgwardt, K. M., and Bedo, J. (2007). “Supervised feature selection via dependence estimation.” Proc., 24th Int. Conf. on Machine Learning, International Machine Learning Society, Princeton, NJ, 823–830.
Stefánsson, A., Koncar, N., and Jones, A. J. (1997). “A note on the gamma test.” Neural Comput. Appl., 5(3), 131–133.
Stone, C. (1984). “An asymptotically optimal window selection rule for kernel density estimates.” Ann. Stat., 12(4), 1285–1297.
Stone, M. (1974). “Cross-validatory choice and assessment of statistical predictions.” J. R. Stat. Soc., 36(2), 111–147.
Sutter, J. M., and Kalivas, J. H. (1993). “Comparison of forward selection, backward selection and generalized simulated annealing for variable selection.” Microchem. J., 47(1–2), 60–66.
Vapnik, V. N. (1998). Statistical learning theory, Wiley, New York.
Wilson, I. D., Jones, A. J., Jenkins, D. H., and Ware, J. A. (2004). “Predicting housing value: Attribute selection and dependence modelling utilising the gamma test.” Adv. Econom., 19, 243–275.
Xu, L., and Zhang, W. J. (2001). “Comparison of different methods for variable selection.” Anal. Chim. Acta, 446(1), 475–481.
Yu, L., and Liu, H. (2004). “Efficient feature selection via analysis of relevance and redundancy.” J. Mach. Learn. Res., 5, 1205–1224.

Information & Authors

Information

Published In

Go to Journal of Hydrologic Engineering
Journal of Hydrologic Engineering
Volume 17Issue 1January 2012
Pages: 182 - 190

History

Received: Sep 28, 2010
Accepted: Mar 29, 2011
Published online: Dec 15, 2011
Published in print: Jan 1, 2012

Permissions

Request permissions for this article.

Authors

Affiliations

W. Z. W. Jaafar [email protected]
Ph.D. Candidate, Dept. of Civil Engineering, Univ. of Bristol, Queen’s Building, Univ. Walk, Bristol BS8 1TR, UK (corresponding author). E-mail: [email protected]
D. Han
Reader in Civil Engineering, Dept. of Civil Engineering, Univ. of Bristol, Queen’s Building, Univ. Walk, Bristol BS8 1TR, UK.

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share