Paper
A Bayesian approach to outlier detection and residual analysis
Published Dec 1, 1988 · K. Chaloner, R. Brant
Biometrika
206
Citations
17
Influential Citations
Abstract
SUMMARY An approach to detecting outliers in a linear model is developed. An outlier is defined to be an observation with a large random error, generated by the linear model under consideration. Outliers are detected by examining the posterior distribution of the random errors. An augmented residual plot is also suggested as a graphical aid in finding outliers. We propose a precise definition of an outlier in a linear model which appears to lead to simple ways of exploring data for the possibility of outliers. The definition is such that, if the parameters of the model are known, then it is also known which observations are outliers. Alternatively, if the parameters are unknown, the posterior distribution can be used to calculate the posterior probability that any observation is an outlier. In a linear model with normally distributed random errors, Ei, with mean zero and variance a 2we declare the ith observation to be an outlier if IEi I> ko- for some choice of k. The value of k can be chosen so that the prior probability of an outlier is small and thus outliers are observations which are more extreme than is usually expected. Realizations of normally distributed errors of more than about three standard deviations from the mean are certainly surprising, and worth further investigation. Such outlying observations can occur under the assumed model, however, and this should be taken into account when deciding what to do with outliers and in choosing k. Note that ei is the actual realization of the random error, not the usual estimated residual ?i. The problem of outliers is studied and thoroughly reviewed by Barnett & Lewis (1984), Hawkins (1980), Beckman & Cook (1983) and Pettit & Smith (1985). The usual Bayesian approach to outlier detection uses the definition given by Freeman (1980). Freeman defines an outlier to be 'any observation that has not been generated by the mechanism that generated the majority of observations in the data set'. Freeman's definition therefore requires that a model for the generation of outliers be specified and is implemented by, for example, Box & Tiao (1968), Guttman, Dutter & Freeman (1978) and Abraham & Box (1978). Our method differs in that we define outliers as arising from the model under consideration rather than arising from a separate, expanded, model. Our approach is similar to that described by Zellner & Moulton (1985) and is an extension of the philosophy
Full text analysis coming soon...