Identifying Outliers in Correlated Water Quality Data
Publication: Journal of Environmental Engineering
Volume 131, Issue 4
Abstract
Evaluating water quality data for outliers is a good quality control/quality assessment procedure whether the data are used for monitoring or for modeling. Often water quality data are correlated, e.g., carbonaceous biochemical oxygen demand (CBOD) has some correlation with . Univariate methods for identifying outliers do not consider the correlation between variables and may identify too many data points as outliers or miss observations which have extreme ratios between variables, e.g., a raw wastewater sample with relatively low CBOD but high . Testing for outliers using multivariate methods such as the Mahalanobis distance, Jackknife distance, -values, or Hadi’s automatically incorporates the correlation or covariance between variables and is fundamentally more correct. Such multivariate methods can better identify potential outliers and avoid eliminating valid data.
Get full access to this article
View all available purchase options and get full access to this article.
Acknowledgments
The writers acknowledge the support provided by the Water Environment Research Foundation and the National Park Service.
References
Anderson, D. R., Sweeney, D. J., and Williams, T. W. (1993). Statistics for business and economics, West, Minneapolis/St. Paul.
Barnett, V., and Lewis, T. (1994). Outliers in statistical data, 3rd Ed., Wiley, New York.
Hadi, A. S. (1992). “Identifying multiple outliers in multivariate data.” J. R. Stat. Soc. Ser. B. Methodol., 54(3), 761–771.
Hadi, A. S. (1994). “A modification of a method for the detection of outliers in multivariate samples.” J. R. Stat. Soc. Ser. B. Methodol., 56(2), 393–396.
Hadi, A. S., and Son, M. S. (1998). “Detection of unusual observations in regression and multivariate data.” Handbook of applied economic statistics, A. Ullah and D. E. Giles, eds., Marcel Dekker, New York, 441–463.
Jobson, J. D. (1992). Applied multivariate data analysis, Springer-Verlag, New York.
Johnson, R. A., and Wichern, D. W. (1998). Applied multivariate statistical analysis, Prentice-Hall, Englewood Cliffs, N.J.
Pope, K. S., and Tabachnik, B. G. (1993). “Therapists anger, hate, fear, and sexual feelings—National survey of therapist responses, client characteristics, critical events, formal complaints, and training.” Prof. Psychol.—Res. Practice, 24(2), 142–152.
Rencher, A. C. (1998). Multivariate statistical inference, Wiley, New York.
Rocke, M. R., and Woodruff, D. L. (1996). “Identification of outliers in multivariate data.” J. Am. Stat. Assoc., 91(435), 1047–1061.
SAS Institute, Inc. (2000). JMP® user’s guide, Cary, N.C.
Tamhane, A. C., and Dunlop, D. D. (2000). Statistics and data analysis, Prentice-Hall, Upper Saddle River, N.J.
United States Environmental Protection Agency (USEPA). (2000). “Guidance for data quality assessment: Practical methods for data analysis, EPA QA/G-9, QA00 update.” EPA/600/R-96/084, USEPA Office of Environmental Information, Washington, D.C. ⟨http://www.epa.gov/quality1/qs-docs/g9-final.pdf⟩ (Sept. 1, 2001).
Information & Authors
Information
Published In
Copyright
© 2005 ASCE.
History
Received: Sep 10, 2002
Accepted: Jun 29, 2004
Published online: Apr 1, 2005
Published in print: Apr 2005
Authors
Metrics & Citations
Metrics
Citations
Download citation
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.