TECHNICAL NOTES
May 21, 2010

Robust Multivariate Outlier Detection Methods for Environmental Data

Publication: Journal of Environmental Engineering
Volume 136, Issue 11

Abstract

Outliers are an inevitable concern that needs to be identified and dealt with whenever one analyzes a large data set. Today’s water quality data are often collected on different scales, encompass several sites, monitor several correlated parameters, involve a multitude of individuals from several agencies, and span over several years. As such, the ability to identify outliers, which may affect the results of the analysis, is crucial. This note presents several statistical techniques that have been developed to deal with this problem, with particular emphasis on robust multivariate methods. These techniques are capable of isolating outliers while overcoming the effects of masking that can hinder the effectiveness of common outlier detection techniques such as Mahalanobis distances (MD). This note uses a comprehensive national metadata set on lake water quality as a case study to analyze the effectiveness of three robust outlier detection techniques, namely, the minimum covariance determinant (MCD), the minimum volume ellipsoid (MVE), and M-estimators. The note compares the results generated from these three techniques to assess the severity of each method when it comes to labeling observations as outliers. The results demonstrate the limitations of using MD to analyze multidimensional water quality data. The analysis also highlighted the differences between the three robust multivariate methods, whereby the MVE method was found to be the most severe when it came to outlier detection, while the MCD was the most lenient. Of the three robust multivariate outlier detection methods analyzed, the M-estimator proved to be the most flexible because it allowed for downweighting rather than censoring many borderline outlier observations.

Get full access to this article

View all available purchase options and get full access to this article.

Acknowledgments

I. Alameddine was partially supported by a scholarship from Quantitative Environmental Analysis, LLC. M. A. Kenney was also partially supported by the STC program of the National Science Foundation via the National Center for Earth-Surface Dynamics under Grant No. NSFEAR-0120914.

References

Ahn, H. (1999). “Outlier detection in total phosphorus concentration data from South Florida rainfall.” J. Am. Water Resour. Assoc., 35(2), 301–310.
Arhonditsis, G. B., et al. (2006). “Exploring ecological patterns with structural equation modeling and Bayesian analysis.” Ecol. Modell., 192(3–4), 385–409.
Barnett, V., and Lewis, T. (1994). Outliers in statistical data, 3rd Ed., Wiley & Sons, Chichester, New York.
Filzmoser, P. (1999). “Robust principal component and factor analysis in the geostatistical treatment of environmental data.” Environmetrics, 10(4), 363–375.
Filzmoser, P., Garrett, R. G., and Reimann, C. (2005). “Multivariate outlier detection in exploration geochemistry.” Comput. Geosci., 31(5), 579–587.
Gilbert, R. R. O. (1987). Statistical methods for environmental pollution monitoring, Van Nostrand Reinhold, New York.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (1986). Robust statistics: The approach based on influence functions, John Wiley & Sons, New York.
Jackson, D. A., and Chen, Y. (2004). “Robust principal component analysis and outlier detection with ecological data.” Environmetrics, 15(2), 129–139.
Jana, J., and Picek, J. (2006). Robust statistical methods with R, Chapman & Hall/CRC/Taylor & Francis Group, Boca Raton, Fla.
Jureckova, J., and Picek, J. (2006). Robust statistical methods with R, Chapman & Hall/CRC, Boca Raton, Fla.
Mullins, J. W., Snelling, R. N., Moden, D. D., and Seals, R. G. (1975). National eutrophication survey: Data acquisition and laboratory analysis system for lake samples, Environmental Monitoring and Support Laboratory, Las Vegas.
Murphy, B. B., and Morrison, R. D. (2002). Introduction to environmental forensics, Academic, San Diego.
Neykov, N. M., Neytchev, P. N., Van Gelder, P. H. A. J. M., and Todorov, V. K. (2007). “Robust detection of discordant sites in regional frequency analysis.” Water Resour. Res., 43(6), W06417.
Ott, W. R. (1995). Environmental statistics and data analysis, CRC, Boca Raton, Fla.
R Development Core Team. (2010). R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria.
Reckhow, K. H., and Chapra, S. C. (1983). Engineering approaches for lake management. Volume 1: Data analysis and empirical modeling, Butterworth, Boston.
Robinson, R. B., Cox, C. D., and Odom, K. (2005). “Identifying outliers in correlated water quality data.” J. Environ. Eng., 131(4), 651–657.
Rocke, D. M. (1996). “Robustness properties of S-estimators of multivariate location and shape in high dimension.” Ann. Stat., 24(3), 1327–1345.
Rocke, D. M., and Woodruff, D. L. (1996). “Identification of outliers in multivariate data.” J. Am. Stat. Assoc., 91(435), 1047–1061.
Rousseeuw, P. J., and Hubert, M. (1997). “Recent developments in progress.” L1-statistical procedures and related topics, Y. Dodge, ed., Institute of Mathematical Statistics, Hayward, Calif.
Rousseeuw, P. J., and Leroy, A. M. (1987). Robust regression and outlier detection, John Wiley & Sons, New York.
Rousseeuw, P. J., and van Driessen, K. (1999). “A fast algorithm for the minimum covariance determinant estimator.” Technometrics, 41(3), 212–223.
Staudte, R. G., and Sheather, S. J. (1990). Robust estimation and testing, John Wiley and Sons, Inc., New York.
Todorov, V. K. (2009). rrcov: Scalable robust estimators with high breakdown point. Reference manual, R-Project, Vienna.
Tong, S. T. Y., and Chen, W. (2002). “Modeling the relationship between land use and surface water quality.” J. Environ. Manage., 66(4), 377–393.
U.S. EPA. (1975). National eutrophication survey, Environmental Monitoring and Support Laboratory, Las Vegas.
U.S. EPA. (1994). Statistical training course on ground water monitoring data analysis, Office of Solid Waste, Washington, D.C.
Walczak, B. (1995). “Outlier detection in multivariate calibration.” Chemom. Intell. Lab. Syst., 28(2), 259–272.
Woodruff, D. L., and Rocke, D. M. (1994). “Computable robust estimation of multivariate location and shape in high dimension using compound estimators.” J. Am. Stat. Assoc., 89(427), 888–896.

Information & Authors

Information

Published In

Go to Journal of Environmental Engineering
Journal of Environmental Engineering
Volume 136Issue 11November 2010
Pages: 1299 - 1304

History

Received: Apr 25, 2009
Accepted: May 19, 2010
Published online: May 21, 2010
Published in print: Nov 2010

Permissions

Request permissions for this article.

Authors

Affiliations

Ibrahim Alameddine, S.M.ASCE [email protected]
Ph.D. Student, Nicholas School of the Environment, Duke Univ., Durham, NC 27708 (corresponding author). E-mail: [email protected]
Melissa A. Kenney
Assistant Research Scientist, Dept. of Geography and Environmental Engineering, Johns Hopkins Univ., Baltimore, MD 21218.
Russell J. Gosnell
Professor, Dept. of Mathematics and Computer Science, North Carolina Central Univ., Durham, NC 27707.
Kenneth H. Reckhow
Professor, Nicholas School of the Environment, Duke Univ., Durham, NC 27708.

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

View Options

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Get Access

Access content

Please select your options to get access

Log in/Register Log in via your institution (Shibboleth)
ASCE Members: Please log in to see member pricing

Purchase

Save for later Information on ASCE Library Cards
ASCE Library Cards let you download journal articles, proceedings papers, and available book chapters across the entire ASCE Library platform. ASCE Library Cards remain active for 24 months or until all downloads are used. Note: This content will be debited as one download at time of checkout.

Terms of Use: ASCE Library Cards are for individual, personal use only. Reselling, republishing, or forwarding the materials to libraries or reading rooms is prohibited.
ASCE Library Card (5 downloads)
$105.00
Add to cart
ASCE Library Card (20 downloads)
$280.00
Add to cart
Buy Single Article
$35.00
Add to cart

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share