Escolar Documentos
Profissional Documentos
Cultura Documentos
A factorial, computational experiment was conducted to compare the spatial interpolation accuracy
of ordinary and universal kriging and two types of inverse squared-distance weighting. The
experiment considered, in addition to these four interpolation methods, the effects of four data and
sampling characteristics: surface type, sampling pattern, noise level, and strength of small-scale
spatial correlation. Interpolation accuracy was measured by the natural logarithm of the mean
squared interpolation error. Main effects of all five factors, all two-factor interactions, and several
three-factor interactions were highly statistically significant. Among numerous findings, the most
striking was that the two kriging methods were substantially superior to the inverse distance
weighting methods over all levels of surface type, sampling pattern, noise, and correlation.
INTRODUCTION
An important statistical problem in the geosciences is the estimation of a two-
or three-dimensional manifold across a study region from observations taken at
known locations in the region. This problem, known as spatial interpolation,
has been addressed using methods that include ordinary and universal krig-
ing, inverse distance weighting, interpolating polynomials, splines, and power
and Fourier series fitting (for reviews of these and other methods, see Franke,
375
1982; Lam, 1983; Cressie, 1993). The variety of available interpolation meth-
ods has led to questions about which is most appropriate in different contexts
and has stimulated several comparative studies of relative accuracy. Unfortu-
nately, published studies are not unequivocal as to which interpolation method
is most accurate. In some studies (Creutin and Obled, 1982; Tabios and Salas,
1985; Rouhani, 1986; Grimm and Lynch, 1991; Laslett and McBratney, 1990;
Weber and Englund, 1994; Laslett, 1994; Phillips and others, 1997), a kriging
procedure performed best, while in others (Laslett and others, 1987; Weber and
Englund, 1992; Gallichand and Marcotte, 1993; Brus and others, 1996; Declercq,
1996) splines or inverse distance weighting were as good or better. Most of
these comparisons were based on analyses of actual geostatistical data, such as
groundwater levels (Rouhani, 1986), acid rain deposition (Grimm and Lynch,
1991), soil pH (Laslett and others, 1987; Laslett and McBratney, 1990), and
land elevation and elevation variances (Weber and Englund, 1992, 1994; Wood
and Fisher, 1993). Using real data rather than synthetic data has several advan-
tages; for example, it precludes one method from having an unfair advantage
merely because the data used for the comparison are generated under the same
model on which the method is based. On the other hand, only with synthetic
data can the effect of certain data characteristics (such as the level of noise or
the strength of spatial correlation) on interpolation accuracy be systematically
evaluated.
The purpose of this paper is to evaluate and compare the accuracy of several
interpolation methods based on an analysis of synthetic data from a computa-
tional experiment. The experiment was designed so that the effects of interpola-
tion method and certain data features and sampling procedures, both individually
and in interaction with one another, could be tested for statistical significance.
The interpolation methods we studied, ordinary kriging, universal kriging, and
two types of inverse squared-distance weighting, are widely used methods that
were given prominence in the relatively recent comparative studies of Weber
and Englund (1992, 1994) and Englund, Weber, and Leviant (1992). The present
study can, in fact, be considered a synthetic-data complement and supplement
to these two previous studies.
Our computational experiment focused on four basic factors in addition to
interpolation method: surface type, sampling pattern, noise (magnitude of error
variance), and strength of spatial correlation among error terms. The three sur-
faces we used, a simple tilted plane, a strong peak based on a sine function, and
an undulating, increasing surface which is the sum of sine and cosine functions,
were selected to encompass a simplified subset of the range of surfaces encoun-
tered in environmental analyses. The surfaces share, at an abstract level, many
characteristics with real environmental surfaces. As Weber and Englund (1994,
p. 590) argued, physical phenomena such as hydraulic head, surface water tem-
perature, and geologic structural surfaces often vary smoothly and continuously.
9812 Mathematical Geology PLENUM Dec′ 98 − Jan′ 99 981203ap 6/29/1999 [632] (PR1)
The three surfaces were sampled with four commonly used patterns so that the
interaction of surface type with sampling pattern could be evaluated.
In general, the surfaces from which environmental samples are taken are
expected to exhibit local variation (noise); in addition, physical processes operate
which often lead to small-scale spatial correlation. To simulate this in our data,
we systematically controlled the level of noise in, and the strength of spatial
correlation among, the error terms, thus permitting an examination of the effects
of noise and spatial correlation on interpolation accuracy.
The systematic design of our experiment, with sampled surfaces interpo-
lated by ordinary and universal kriging and by two types of inverse squared-dis-
tance weighting, permits comparative evaluation of the four methods. By con-
trolling values of the parameters discussed above, we are able to reach conclu-
sions about the conditions under which kriging (either ordinary or universal)
or inverse squared-distance weighting performs better, as well as the conditions
under which particular sampling patterns may be preferred.
EXPERIMENTAL PROCEDURE
Many characteristics of sampling and data could conceivably affect the per-
formance of interpolation methods. As noted in the previous section, we consid-
ered four basic factors in this investigation in addition to interpolation method:
surface type, sampling pattern, magnitude of error variance, and strength of
small-scale spatial correlation. In this section, we describe how the levels of
these factors were selected and how our computational experiments were con-
ducted.
The experimental region for our investigation was taken to be the unit
square {(x, y): 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. Three mathematical surfaces, hence-
forth called the plane, the sombrero, and Morrison’s surface, were defined on
this region (Fig. 1). Each of these surfaces represents a particular kind of drift,
or large-scale spatial variation. The drift functions that define the plane and the
sombrero are:
Plane:g(x, y) c 1.05 − x
if x ? 0.5 or y ? 0.5
c 1, if x c y c 0.5
The drift function that defines Morrison’s surface, given in Morrison (1971) but
not duplicated here, is rather complicated; it is a linear combination of 48 sine
9812 Mathematical Geology PLENUM Dec′ 98 − Jan′ 99 981203ap 6/29/1999 [632] (PR1)
Figure 1. Mathematical surfaces used for the experiment: A, plane; B, sombrero; C, Morrison’s
surface.
C(r; j 2 , a) c j 2 e − ar
where r is the (Euclidean) distance between points, j 2 > 0 is the error variance,
and a is the correlation strength parameter. Two values of j 2 , 0.0036 and 0.0576,
and two values of a, 4 and 10, were considered. These were chosen so as to be
reasonable in relation to the total variation of each surface and to the area of
9812 Mathematical Geology PLENUM Dec′ 98 − Jan′ 99 981203ap 6/29/1999 [632] (PR1)
Figure 2. Examples of the four types of sampling patterns used for the experiment: A, hexagonal
grid; B, inhibited; C, random; D, clustered.
the experimental region. When j 2 c 0.0036, roughly 95% of the errors have
magnitudes less than 2 . (0.0036)1/ 2 c 0.12 (6% of the range of the data) and
fewer than 0.1% of the errors exceed 10% of the range in magnitude; thus, this
choice of j 2 introduces a relatively small amount of noise into the observations.
When j 2 c 0.0576, however, much more noise is introduced; only 38% of the
errors have magnitudes less than 6% of the range, roughly 40% of the errors
exceed 10% of the range, and about 4% of the errors exceed 50% of the range.
When a c 4, errors at points separated by a distance of 0.25 units are correlated
by the amount 0.37; that is, moderate correlation exists at a distance of one-
9812 Mathematical Geology PLENUM Dec′ 98 − Jan′ 99 981203ap 6/29/1999 [632] (PR1)
fourth the length of the experimental region’s side. When a c 10, however, the
correlation is much weaker; the correlation between points 0.25 units apart is
only 0.08.
The experiment was carried out as follows. For each combination of surface
type, sampling pattern, j 2 , and a, three realizations of the sampling pattern were
obtained and then, conditional on those data locations, errors with the autoco-
variance structure specified by j 2 and a were generated at the 100 (or 99) data
locations and 400 evaluation gridpoints simultaneously. In this way, a total of
144 (3 surface types × 4 pattern types × 3 realizations of each pattern type × 2
values of j 2 × 2 values of a) sets of 500 (or 499) errors referenced by spatial
location were generated. Next, each set of errors was added to the correspond-
ing mathematical surface at the 500 (or 499) locations, yielding a synthetic data
set of 100 (or 99) observations and a “true surface” of 400 values at evaluation
points. Note that we define the true surface as the mathematical surface plus auto-
correlated error; since our synthetic data arose from a perturbed surface, it was
most appropriate for the surface to which interpolated values were compared,
to have the same degree of perturbation. Each set of 100 (or 99) observations
was processed by one of four interpolation algorithms: ordinary kriging (OK),
universal kriging (UK) using (as is common practice) a full second-order polyno-
mial drift function, and inverse squared-distance weighting using the nearest six
observations (ISDW-6) or the nearest 12 observations (ISDW-12). The results
from these interpolations then were used to estimate the value of the true surface
at each of the 400 evaluation points. An estimation error at each evaluation point
and for each interpolation method was obtained by subtracting the value of the
true surface at the point from the value estimated by the interpolation method.
The four interpolation methods were carried out using suitable functions in
S+SpatialStats: krige and predict.krige for OK and UK, and find.neighbor and
functions we wrote ourselves for ISDW-6 and ISDW-12 (for details on these and
related functions, see Kaluzny and others, 1997). Implementation of the two krig-
ing procedures was preceded by the selection of a suitable semivariogram model,
and the estimation of that model, specific to each dataset. This was accomplished,
as is common practice, by fitting each of several semivariogram models (lin-
ear, Gaussian, spherical, and exponential) to the empirical semivariogram by
weighted least squares (Cressie, 1993, sect. 2.6.2) and selecting the model with
the smallest weighted residual sum of squares. The S+SpatialStatsfunctions
variogram and vario.fit were useful in this regard.
To measure interpolation accuracy, the natural logarithm of the mean
squared error [log(MSE)], the root-mean-squared error, and the mean absolute
error were computed for each dataset. In addition, the correlation between the
estimated values and true values at evaluation points was computed for each
dataset. In what follows, we present statistical results for log (MSE) only, as
this was the only measure which residual analysis showed to be nearly normally
9812 Mathematical Geology PLENUM Dec′ 98 − Jan′ 99 981203ap 6/29/1999 [632] (PR1)
distributed. Although the other performance measures had distributions that were
not as well approximated by normality, the results for these measures were qual-
itatively similar to results for log (MSE).
The complete experiment can be regarded as a randomized factorial exper-
iment with five factors: interpolation method, surface type, sampling pattern,
noise level (j 2 ), and correlation strength (a). Each cell of the five-way layout
contained three replicates (calculated from the synthetic datasets corresponding
to the three realizations of each sampling pattern). An experimental measure-
ment was obtained by: (1) comparing the interpolated surface to the true surface
for a dataset at the 400 evaluation points, in the manner described previously,
and (2) reducing the 400 interpolation errors to a single numerical performance
measure [log(MSE)]. The design yielded 576 such measurements.
Accordingly, our analysis of the experiment is based on the model
where log(MSE)ijklmn is log (MSE) for the ith interpolation method and for the
nth dataset at levels j, k, l, m of the other four factors; f is a specified function
(discussed further below); m i is the effect of interpolation method i (i c 1, 2, 3, 4
for OK, UK, ISDW-6, or ISDW-12); t j is the effect of surface type j ( j c 1, 2, 3
for plane, sombrero, or Morrison’s surface); pk is the effect of sampling pattern
k (k c 1, 2, 3, 4 for hexagonal, inhibited, random, or clustered); n l is the effect
of the lth level of noise (l c 1 or 2 for j 2 c 0.0036 or 0.0576); am is the effect
of the mth level of correlation (m c 1 or 2 for a c 4 or 10); e ijklmn is the residual
for the nth data set at levels i, j, k, l, m of the five factors; these residuals are
assumed to be independently and identically distributed normal random variables
with mean zero and positive variance.
The main-effects version of this model takes f to be a simple additive func-
tion of the five factor effects; that is, f (m i , t j , pk , n l , am ) c m i + t j + pk + n l
+ am . We found the most useful version, however, to be the model with main
effects and two- and three-factor interactions.
RESULTS
Table 1 gives the analysis-of-variance table that corresponds to the pre-
scribed model for log (MSE) with main effects and two- and three-factor inter-
action effects. The model described a large portion (96.7%) of the total varia-
tion, and a standard Shapiro-Wilks test could not reject the null hypothesis of
normally-distributed residuals (P c 0.18). All main effects and two-factor inter-
actions were highly significant (P < 0.0001). Also highly significant were three
of the ten three-factor interactions. Further analysis, based on a model which
9812 Mathematical Geology PLENUM Dec′ 98 − Jan′ 99 981203ap 6/29/1999 [632] (PR1)
Table 1. ANOVA for the Experiment, Based on a Model for Log (MSE) with Main Effects and
Two- and Three-Factor Interactionsa
Source df MS F Pr > F
Figure 3. Interaction plots for the significant two-factor interactions: A, Method × Surface; B,
Method × Pattern; C, Method × Noise; D, Method × Correlation; E, Surface × Pattern; F, Sur-
9812 Mathematical Geology PLENUM Dec′ 98 − Jan′ 99 981203ap 6/29/1999 [632] (PR1)
Figure 4. Sample one-factor means of log (MSE). Horizontal bars represent means and values
obtained by adding and subtracting twice the standard error from the mean.
interactions are much smaller than the main effects. Consequently, apart from
the two exceptions noted, the significance of the two-factor interactions does not
materially affect conclusions drawn from examining main effects only.
Figure 4 displays the one-factor means of log (MSE), averaged over all
levels of the other factors, and their standard errors. The information in this
figure can be summarized as follows:
1. The kriging methods performed much (20–30%) better than the two
inverse squared distance weighting procedures. Furthermore, OK out-
performed UK slightly and ISDW-6 outperformed ISDW-12 slightly.
2. The performance of each interpolation method was significantly better
for the plane than for the sombrero, which in turn yielded better perfor-
mance than Morrison’s surface.
3. The performance of each interpolation method deteriorated significantly
as the sampling pattern changed from hexagonal, to inhibited, to random,
and finally to clustered.
4. The performance of each interpolation method was significantly better
when j 2 c 0.0036 than when j 2 c 0.0576. That is, interpolation accuracy
improved when the data were not as noisy.
5. The performance of each interpolation method was significantly better
when a c 4 than when a c 10. That is, interpolation accuracy improved
when the data were more highly spatially correlated.
9812 Mathematical Geology PLENUM Dec′ 98 − Jan′ 99 981203ap 6/29/1999 [632] (PR1)
DISCUSSION
Several important points emerge from the results of the previous section.
Because the primary objective of this investigation was to compare interpolation
methods, we first consider the effect of interpolation method on accuracy [as
measured by log (MSE)]. The most striking result along these lines was that the
two kriging methods consistently and substantially outperformed the two inverse
distance weighting methods over all levels of the other factors. This agrees with
the findings of several authors who have made comparisons among interpolation
methods, but contradicts the findings of several other such authors (refer to the
Introduction or to Cressie, 1993, section 5.9.3, for a list of relevant literature). We
make no claim that our study completely lays this issue to rest; however, we do
find the consistent superior performance of kriging over various types of surfaces
and sampling patterns and various levels of noise and spatial correlation in our
study rather more compelling than findings from less comprehensive studies. Of
course, kriging has several other advantages to inverse distance weighting that
are not performance-related. These include the capability of adapting easily to
the estimation of spatial averages (block kriging) and the provision of a measure
of variability of interpolated values (the kriging variance).
A second point pertaining to the comparison of interpolation methods is
that although the kriging methods were consistently superior to inverse squared-
distance weighting over all levels of the other factors, the extent of superiority
was affected by the other factors. More specifically, the disparity in performance
between the two types of methods tended to be greatest at those levels of the
other factors at which all four methods performed best. For example, in com-
paring interpolation methods over the three surfaces, the relative superiority of
kriging was greatest for the plane and smallest for Morrison’s surface: these are
the surfaces for which all four methods do best and worst, respectively. Simi-
larly, the relative superiority of kriging was greatest when the sampling pattern
was most regular, noise was low, and spatial correlation was high.
Thirdly, consider comparisons of the two kriging methods with each other
and the two inverse distance weighting methods with each other. It was expected
that UK would outperform OK for the plane because the noiseless version of this
surface can be fit exactly by the second-order polynomial function employed by
UK (in fact a first-order polynomial would suffice). And indeed this was so,
although the difference in performance was not statistically significant. Nor was
the difference significant for Morrison’s surface. For the sombrero, however,
OK performed significantly better than UK. This last result is interesting, for
it indicates that for some types of surfaces it is better to completely ignore the
modeling of trend (as OK does) than to model trend inappropriately (as, appar-
ently, UK does for the sombrero). This issue requires further investigation; see
also the relevant work of Journel and Rossi (1989).
9812 Mathematical Geology PLENUM Dec′ 98 − Jan′ 99 981203ap 6/29/1999 [632] (PR1)
formance was best for the plane, next best for the sombrero, and worst for Mor-
rison’s surface. That the plane yielded the best performance is not surprising in
light of this surface’s extreme smoothness, but it is difficult to characterize the
sombrero or Morrison’s surface as smoother than the other and hence difficult
to explain why performance was better for the former than for the latter. Addi-
tional investigation is required to understand how surface type affects interpo-
lation performance at both the overall level [as measured, for example, by log
(MSE)] and the micro level. In particular, more work is needed to explicitly
identify and parameterize those surface characteristics (e.g., peaks, troughs, or
portions of the surface with large gradients) that are responsible for the great-
est discrepancies in performance. At finer scales, the grid evaluation approach
used in this paper could serve as the basis for visualizing and evaluating local
performance characteristics.
ACKNOWLEDGMENTS
This research was partially supported by a grant from the Center for Global
and Regional Environmental Research at The University of Iowa. Michael Hodg-
son kindly provided assistance in locating the equations used to generate Morri-
son’s surface. The computing assistance of Yanming Jiang is gratefully acknowl-
edged. We thank two anonymous referees for comments which substantially
improved the paper.
REFERENCES
Brus, D. J., de Gruijter, J. J., Marsman, B. A., Visschers, R., Bregt, A. K., and Breeuwsma, A., 1996,
The performance of spatial interpolation methods and choropleth maps to estimate properties
at points: A soil survey case study: Environmetrics, v. 7, no. 1, p. 1–16.
Cressie, N., 1993, Statistics for spatial data: John Wiley & Sons, New York, 900 p.
Creutin, J. D., and Obled, C., 1982, Objective analyses and mapping techniques for rainfall fields:
An objective comparison: Water Resources Res., v. 18, no. 2, p. 413–431.
Declercq, F. A. N., 1996, Interpolation methods for scattered sample data: Accuracy, spatial patterns,
processing time: Cartography and Geographic Information Systems, v. 23, no. 3, p. 128–144.
Diggle, P. J., 1983, Statistical analysis of spatial point patterns: Academic Press, London, 148 p.
Englund, E. J., Weber, D. D., and Leviant, N., 1992, The effects of sampling design parameters on
block selection: Math. Geology, v. 24, no. 3, p. 329–343.
Franke, R., 1982, Scattered data interpolation: tests of some methods: Mathematics of Computation,
v. 38, no. 157, p. 181–200.
Gallichand, J., and Marcotte, D., 1993, Mapping clay content for subsurface drainage in the Nile
delta: Geoderma, v. 58, nos. 3/ 4, p. 165–179.
Grimm, J. W., and Lynch, J. A., 1991, Statistical analysis of errors in estimating wet deposition
using five surface estimation algorithms: Atmospheric Environment, v. 25a, no. 2, p. 317–327.
Haining, R., 1990, Spatial data analysis in the social and environmental sciences: Cambridge Univ.
Press, Cambridge, 409 p.
Hodgson, M. E., 1992, Sensitivity of spatial interpolation models to parameter variation: ACSM
9812 Mathematical Geology PLENUM Dec′ 98 − Jan′ 99 981203ap 6/29/1999 [632] (PR1)