Escolar Documentos
Profissional Documentos
Cultura Documentos
362
Burrough
Keywords: geographic information systems, geostatistics, statistical methods, spatial analysis, environmental modeling, map algebra, fuzzy sets
1352-8505 # 2001 Kluwer Academic Publishers
363
In brief, GIS are sets of computer tools for the storage, retrieval, analysis and display of spatial data. GIS may also be required to supply data to numerical models of environmental processes (e.g., air quality, water quality and quantity, plant-soilenvironment responses, etc.) and display the results of these models as cartographically acceptable screen or hard copy images. By convention, GIS analyses are almost exclusively deterministic and data are assumed to be exact. Apart from specialists (e.g., Heuvelink and Lemmens, 2000) the GIS community has shown little regard for issues of uncertainty and spatio-temporal variability apart from geometric precision. This is not because of computational problems, but because market forces have determined that many GIS applications need not address these issues.
364
Burrough
Figure 1. Left: Soil prole classes at sample sites (dot is unit Cr, small circle with dot is unit Ct and large circle with dot is unit Ia). Right: Soil thickness at sample sites (dot is 040 cm, small ag is 40 80 cm, and large ag is 4 80 cm).
three soil types have signicantly different means (Table 2) so there is little point in simplifying the soil map. As another example of straightforward statistical analysis using a linked statistics package, Fig. 2 presents the results of carrying out a multivariate discriminant analysis on all the 20 attributes of the soil collected at each of the 126 sample sites. This clearly shows that though the centroids of the three soil types clearly differ in multivariate space, there is considerable overlap.
365
67,0232 1000
science to name but a few), this approach is not always sensible and it is better to consider the variation of the attribute in terms of a continuous, but noisy surface. This surface is often constructed by interpolation from sets of point data. Though there are many methods for interpolation (see Burrough and MacDonnell, 1998), most of these treat the data as if they can be modeled by a smooth, differentiable surface and no attention is paid to the uncertainty of the results. The methods of geostatistics (Matheron, 1965; Journel, 1996; Goovaerts, 1997) use the stochastical theory of spatial correlation both for interpolation and for apportioning uncertainty. Although still unfamiliar to many GIS users, in terms of technical development, the
Figure 2. Plot of discriminant functions for all 126 soil observations compared with map classes.
366
Burrough
methods of geostatistics are of similar age to GIS, but have different roots. Whereas GIS was seen as a way to automate the creation of exact, deterministic models of the world in a dominantly cartographic context, geostatistics is about making predictions under conditions of uncertainty and limited information. The path of geostatistics from its founders Krige and Matheron in the 1960s and 1970s to present day exponents such as Journel, Goovaerts and others emphasizes the role of chance in spatial prediction. Where GIS ignores statistical variation, geostatistics uses the understanding of statistical variation as an important source of information for improving predictions of an attribute at unsampled points, given a limited set of measurements. Geostatistics are therefore a very useful ``add on'' or extension to the GIS toolkit for spatial analysis. A central aspect of geostatistics is the use of spatial autocovariance structures, often represented by the (semi)variogram, or its cousin the autocovariogram, which differentiate different kinds of spatial variation. The semivariance indicates the degree of similarity of values of a regionalized variable Z over a given sample spacing or lag, h. Semivariograms (Fig. 3) are graphs of the semivariance gh against sample spacing or lag, h: they are dened as: 1 gh Var fZ xi Zxi h g 2 and estimated from sampled data by:
n 1 X 2 ^ fzxi zxi h g g h n 2 i1
where n is the number of samples, and zxi ; zxi h are measurements separated by a distance h. In practice, ^ gh is estimated from sets of point samples which can be extracted from the GIS data base. Because experimentally derived semivariances do not always follow a smooth increase with sample spacing, a theoretical variogram model is tted to the data (Burrough and McDonnell, 1998; Deutsch and Journel, 1998; Goovaerts, 1997). The interpolation weights for predicting the value of attribute ^ z at unsampled locations x are derived with the help of this tted model and the method is known as ordinary point kriging (OPK) after its rst exponent. Predictions can also be computed for units of land (blocks) larger than those sampled, thereby smoothing out local variationsthis is known as block kriging. Much practical geostatistics is concerned with the estimation and tting of variograms to experimental data (Pannatier, 1996) followed by interpolation or conditional simulation of gridded surfaces (Pebesma and Wesseling, 1998). Besides interpolation, kriging provides information on interpolation errors. Knowledge of the spatial correlation structures may also be used to generate sets of equiprobable realizations (simulations) of the attribute z that can be of great value for studying error propagation through spatial models that may be linked to the GIS. For many users of GIS, kriging is no more than an alternative method of interpolation (see Burrough and McDonnell, 1998 for references). Indeed, many statisticians and geographers use other methods for statistical spatial analysis (c.f. Bailey and Gatrell, 1995; Cressie, 1991). The general lack of appreciation of geostatistics by the GIS community during the seminal years from the mid-1970s to the mid-1990s was due to many factors, including the publication of Matheron's original treatize in French (Matheron, 1965),
367
Figure 3. Example of a semivariogram tted to experimental data. The numbers indicate the numbers of pairs of points used at each lag.
which is therefore inaccessible to most native English speakers. Until the mid-1990s, the high prices charged for geostatistics software packages and their almost exclusive use by mining corporations made it difcult to teach geostatistics in many universities. Of course, a contributing factor to the lack of interest in geostatistics by the GIS practitioner is its grounding in mathematical statistics which clearly bafes those of us who have little feeling for the statistical treatment of sampling, variance analysis and correlation and regression.
368
Burrough
provide the means to register the locations of samples directly (via GPS or other methods), or to convert local coordinates to standard coordinates. The use of standard coordinates ensures that data collected at different times can be properly combined and overlaid on conventional maps. The use of standard coordinate systems is particularly important when international databases are created from different sources, such as occurs in Europe, for example. Exploratory spatial data analysis. As already noted, ESDA is a useful toolkit for examining data prior to analysis. For geostatisticians, the presence and location of spatial outliers, or other irregularities in the data may have important consequences for the tting of variograms, or for determining whether data should be transformed to logarithms. GIS often provide search engines that can be linked to statistical packages to determine whether any given data set contains anomalies or unexpected structure. The underlying reasons for such anomalies may sometimes be easily seen when these data are displayed on a map together with other information. Not all users of ESDA in GIS use conventional geostatistics, however, and other measures of spatial autocorrelation such as Moran's I statistic are often used (Pereira et al., 1998). Spatial context and the use of external information. Increasingly, the suite of geostatistical methods currently available allow the user to incorporate external information that can be used to modify, and possibly improve, the predictions or simulations required. Geostatisticians term the external information ``secondary'', because they believe that the ``hard data'' measured at the sample locations is most important. But GIS practitioners might prefer to call the ``primary data'' that which separates a landscape into its main componentsdifferent soils, or rock types, or land cover classes, regarding the sampled data as merely lling in the details that were not apparent at the smaller map scale. In any case, GIS makes it possible to incorporate data from other aspects of the environment with the geostatistical study of autocorrelation structures, so that differentiated knowledge of different patterns of variation can be used to best effect. For example, in the c. 5 6 2 km study area used in Principles of Geographical Information Systems (Burrough and McDonnell, 1998) the distribution of heavy metals (zinc) in the top soils of the river alluvium was clearly inuenced by ooding regime, which in turn is affected by factors such as distance from the river and the relative elevation of the oodplain. Fig. 4 shows how the extra information may be used in several ways. Stratied kriging involves dividing the original set of 155 soil samples into classes based on ooding frequencya simple ``point-in-polygon'' search in GISto yield three strata. Variograms were estimated for each stratum and these were interpolated to yield a single map (Fig. 4b). In a second approach, a multiple regression model was computed from the triplets of zinc level, elevation and distance to river measured at all data points (Fig. 4c). A third approach, known as ``Universal kriging'' directly incorporates the trend in the estimation of the interpolation weights and Fig. 4d illustrates how both stratication and trends may be combined. The results clearly show the differences in the patterns obtained with and without the ancillary data. The single, or combined incorporation of external information through stratication and strata-specic trends yielded maps with good levels of prediction and a spatial resolution that was better than could have been obtained from ordinary point kriging alone. Other examples are given in Goovaerts (1997, 1999). Display and visualization2D, 3D, plus time. Who is the recipient of a geostatistical interpolation? If a geostatistician, or statistician, then simple maps and tables of numbers
369
Figure 4. Results of interpolating the ln(Zinc) levels of topsoils (010 cm) in a frequently ooded part of the Maas oodplain, Limburg, NL. a: ordinary point kriging, b: OPK within different ooding strata, c: using a regression model based on elevation and distance from the river, d: universal kriging with a single trend, e: universal kriging with stratication and different trends for each stratum.
may sufce, but environmental managers need to see how the results relate to other aspects of the terrain. Today it is easy to import the results of a kriging interpolation into a GIS and display the results in conjunction with a scanned topographic map, or display them in 3D over a digital elevation model (DEM) of the landscape from which the samples were taken (Fig. 5). Such presentation invites visual interpretation, the re-evaluation of results and the discovery of more information, and therefore is an essential part of the spatial analysis process.
370
Burrough
Figure 5. 3-Dimensional display of interpolation results obtained from stratied kriging on a digital elevation model with shading and transparency oated above a scanned topographic map. Dark gray zones indicate heavy metal concentrations.
371
data surrounding an unsampled location, and the stronger the autocorrelation structure, the lower the estimation variance. Error propagation in spatial models. When data from interpolated surfaces are used as inputs to numerical models, the error surfaces associated with kriging interpolation may be used to understand the propagation of errors through spatial models. Heuvelink (1998) gives both theory and examples of using Taylor series expansion on interpolated data to compute error propagation through cartographic modelssee also Burrough and McDonnell (1998). An increasingly popular alternative to the Taylor expansion method is to use methods of conditional simulation (Pebesma and Wesseling, 1998) to provide sets of multiple realizations of data surfaces for inputs to numerical models like the 3D groundwater model ``MODFLOW'', so that error propagation and model sensitivity can be followed using z-Herna ndez and Journel, 1992). Monte Carlo methods (e.g., Bierkens, 1994; Gome Monte Carlo techniques using conditional simulation may also be useful for comparing data collected at different times and locations within the same area. Recent work on the redistribution of137Cs fallout from the Chernobyl nuclear disaster in 1986 has shown that the normal decay of radiocaesium levels and uptake rates in cow's milk can be temporally reversed if the cows are grazing on recently ooded, poorly drained peat soils (Burrough and McDonnell, 1998; Burrough et al., 1999a). The data for these studies consisted of radionuclide determinations made on bulked soil samples taken in 1988 and 1993. Unfortunately, the samples were collected at different sites in the two years, so it was difcult to use the raw data to test the hypothesis that the ood events had really enhanced radio caesium levels near the rivers. However, by computing the variograms for the data sets from both years and using these to compute sets of conditional simulations of the normalized differences of radiocaesium in the topsoil between the two sampling times and at all sampled sites, it was possible to establish a clear relation between the incidence of ooding and ood-induced enhancement of radiocaesium which could enter the food chain (Burrough et al., 1999a). Fig. 6 shows clearly that although there seem to be systematic differences between the two years (mean values for 1993 exceed those for 1988 by 0.51.0 standard errors) sites within 1.5 km of a ooding river are not only more variable, but many have higher levels of radio caesium. Data reduction and spatial generalization. In some applications there may be too much data, which may need to be reduced to manageable proportions or common coordinates. An example is the need to compare the yields of different crops over several years on the same plot when yields have been recorded using data loggers and GPS. For example, Burrough and Swindell (1997) report the collection of annual yield data for three successive crops on a 5 ha eld at the experimental farm of the Royal College of Agriculture, Cirencester, UK. Data were collected on wheat, barley and oilseed rape in successive years by a combine harvester tted with a data logger whose location was pinpointed by locally referenced GPS. The spatial resolution of the sample was approximately 4 m (the width of the harvester) 6 2.5 m (along the cut), and each survey yielded some 2000 samples or more. Because of locational noise in the GPS and errors in the amount of crop cut each 2.5 m by the harvester, it was not possible to relate the yields of the three crops directly to location in the eld nor to investigate links between crop yields and soil conditions. To generalize and smooth the data, for each year an isotropic variogram was computed: the data were then interpolated to a common grid of 2.5 m resolution using block kriging with
372
Burrough
Figure 6. Plots of conditional simulations for the 19881993 normalized differences of 137Cs at data points, with distance to rivers that ood.
units of 25 6 25 m. Each annual map was normalized to give a map showing relative yield; these three maps were then combined to give a three year, normalized average. Comparison of the normalized average yield map with a computer enhanced, scanned aerial image of the site (Fig. 7) demonstrates clear relations between site conditions and normalized crop yields that otherwise were not apparent.
Figure 7. Comparison between aerial photo image of eld A and displayed on its right, the average, standardized crop yields as interpolated using block kriging.
373
Geostatistics and remote sensing. The applications of geostatistical methods in the analysis of remotely sensed images is a topic in itself. Here I refer the reader to the recent issue of Photogrammetric Engineering and Remote Sensing (January, 1999) for a recent compilation of research. Remote Sensing applications of geostatistics have less to do with interpolation from sparse data (the images are complete unless masked by cloud cover in which interpolation could be used to ll in the gaps) than with the description and analysis of gridded, stochastic surfaces and the simulation of multiscale data sets.
Figure 8. a: Single realization of a drainage network derived from a smooth DEM; b: average image computed from 100 realizations derived from the initial DEM plus 10 cm root mean square (RMS) error.
374
Burrough
better idea of surface water drainage may be obtained by considering the average properties of a suite of possible drainage nets that are obtained when surface roughness is added to the DEM. The roughness can easily be modeled by a small Gaussian noise which is added to each cell (a standard deviation equal to 0.1% of the maximum relief difference in the area is enough as a rst approximation); the result yields one possible realization of the net. Repeating the procedure for 1001000 times with different random values for roughness creates an average probability density map of the cumulative contributing area (Fig. 8b) which appears to be more realistic than the single deterministic solution. Note that one cannot compute Fig. 8b by passing a moving window smoothing function over Fig. 8a. The effects of small errors on the derived ow paths may be effectively demonstrated by displaying the whole set as a movie, when the amplitudes and locations of the swings of drainage paths resulting from the minor errors will become very apparent. Though this example uses spatially uncorrelated noise for each realization of the DEM surface, one could of course examine the effects of spatially correlated noise on the model by rst creating a set of conditional simulations based on a known or assumed variogram. Repeating the analysis for multiple realizations and displaying these using dynamic visualization enhances understanding of the results. Adding stochasticity to make a deterministic process model work properly. In certain situations it appears to be necessary to add roughness to a surface so that a well-known deterministic process can be modeled effectively, and this is illustrated using the example of the creation of an alluvial fan. If a hillside is modeled as a smooth inclined plane, then the topology consists merely of a set of parallel lines that run from top to bottom, much like the way rain falling on the windscreen of a stationary car runs off in parallel streams. These streams can be ``forced'' to merge if the initial surface is roughend (e.g., Liverpool and Edwards, 1995). In the case of the alluvial fan, each ``event'' by which material falls down the slope and is added to the fan modies the surface roughness in a way that is very difcult to predict, but which must not be ignored. So the initial roughness is modied by feedback from the sedimentation process so that for each cycle there is a new surface for the ow and deposition. If the deposits are sufciently large, the surface topology changes with each cycle. The need for initial roughness which is modied but maintained during the development of the delta is a nice example of how a better understanding of the physical process may arise by linking geostatistics with interactive dynamic modeling. Ongoing research in Utrecht and elsewhere is beginning to demonstrate the value of conditional simulation in dynamic, as well as static models of landscape change (see Karssenberg et al., in press).
375
geostatistics, but as a complementary suite of methods for operating in uncertain conditions. The main uses of fuzzy subsets in GIS are for the selection and retrieval of data under conditions of uncertainty (eg., Burrough and McDonnell, 1998; Canters, 1997), and in creating multivariate classes that overlap (fuzzy k-means) (Burrough et al., 1999b). Data retrieval using fuzzy subsets has been demonstrated to be less error prone than conventional Boolean SQL methods (Heuvelink and Burrough 1993). Fuzzy memberships can be interpolated using kriging (de Gruijter et al., 1997; Burrough and McDonnell, 1998) and the application of fuzzy k-means to derivatives of digital elevation models provides convincing and objective methods for classifying terrain (Burrough et al., 2000, 2001). Fuzzy subsets can also be used to address issues of the crispness of spatial boundaries (e.g., Lagacherie et al., 1996) or the intervisibility across 3D surfaces (Fisher, 1995). Fuzzy subsets may also be used to dene sensible ways to select point data for kriging.
5. Conclusions
This review has demonstrated that GIS, statistics and geostatistics have much to give to each other, particularly when GIS are used for environmental analysis. Geostatistics benet from having standard methods of geographical registration, data storage, retrieval and display, while GIS benets by being able to incorporate proven methods for testing hypotheses and for handling and understanding errors in data and illustrating their effects on the outcomes of models used for environmental management. In some situations, geostatistics may be supplemented by non-probabilistic methods of handling uncertainty such as provided by fuzzy subsets.
References
Bailey, T.C. and Gatrell, A.C. (1995) Interactive Spatial Data Analysis, Longman, Harlow, 413 pp. Bierkens, M.F.P. (1994) Complex Conning Layers: A Stochastic Analysis of Hydraulic Properties at Various Scales, Royal Dutch Geographical Association (KNAW)/Faculty of Geographical Sciences, University of Utrecht, Utrecht, NL. Burrough, P.A. (1996) Opportunities and limitations of GIS-based modeling of solute transport at the regional scale. In: Application of GIS to the Modeling of Non-Point Source Pollutants in the Vadose Zone, SSSA Special Publication 48, Soil Science Society of America, Madison, 1937. Burrough, P.A. and Frank, A. (1996) (eds), Geographic Objects with Indeterminate Boundaries, GISDATA Series 2, Taylor and Francis, London. Burrough, P.A., van Gaans, P.F.M., and MacMillan, R.A. (2000) High-resolution landform classication using fuzzy k-means. Journal of Fuzzy Sets and Systems, 113, 3752. Burrough, P.A., van Gaans, P.F.M., Wilson, J., and Hansen, A.J. (2001) Fuzzy k-means classication of topo-climatic data as an aid to forest mapping in the Greater Yellowstone Area, USA. Landscape Ecology, 16, 52346. Burrough, P.A. and McDonnell, R.A. (1998) Principles of Geographical Information Systems, Oxford, Oxford University Press, 330 pp. Burrough, P.A. and Swindell J. (1997) Optimal mapping of site-specic multivariate soil properties. In Precision Agriculture: Spatial and Temporal Variability of Environmental Quality, J. Lake, G. Bock, and J. Goode (eds), Proc: CIBA Foundation Symposium 210, John Wiley and Sons, Chichester, pp. 20820.
376
Burrough
Burrough, P.A., van der Perk, M., Howard, B., Prister, B., Sansone, U., and Voitsekhovitch, O.V. (1999a) Environmental mobility of Radiocaesium in the Pripyat Catchment, Ukraine/Belarus. Water, Air and Soil Pollution, 110, 3555. Burrough, P.A., van Gaans, P.F.M., and MacMillan, R.A. (2000) High-resolution landform classication using fuzzy k-means. Journal of Fuzzy Sets and Systems, 113, 3752. Canters, F. (1997) Evaluating the uncertainty of area estimates derived from fuzzy land-cover classication. Photogrammetric Engineering and Remote Sensing, 63, 40314. Coppock, J.T. and Rhind, D.W. (1991) The history of GIS. In: Geographical Information Systems, Vol. 1, Principle, D.J. Maguire, M.F. Goodchild, and D.W. Rhind (eds), Longman Scientic and Technical, New York, pp. 2143. Cressie, N. (1991) Statistics for Spatial Data, Wiley, New York, 900 pp. De Gruijter, J.J., de Walvoort, D., and van Gaans, P. (1997) Continuous soil mapsa fuzzy set approach to bridge the gap between aggregation levels of process and distribution models. Geoderma, 77, 16995. Deutsch, C. and Journel, A.G. (1998) GSLIB Geostatistical Handbook, 2nd edition, Oxford. Fisher, P.F. (1995) An exploration of probable viewsheds in landscape planning. Environment and Planning B: Planning and Design, 22, 52746. z-Herna ndez, J.J. and Journel, A.G. (1992) Joint sequential simulation of multigaussian elds. Gome In: A. Soares (ed), Proc. Fourth Geostatistics Congress, Troia, Portugal. Quantitative Geology and Geostatistics, (5), 8594, Dordrecht, Kluwer Academic Publishers. Goovaerts, P. (1997) Geostatistics for Natural Resources Evaluation, Oxford University Press, 483 pp. Goovaerts, P. (1999) Using elevation to aid the geostatistical mapping of rainfall erosivity. CATENA, 34, 22742. Heuvelink, G.B.M. (1998) Error Propagation in Environmental Modeling, Taylor and Francis, London, 127 pp. Heuvelink, G.B.M. and Burrough, P.A. (1993) Error propagation in cartographic modeling using Boolean logic and continuous classication. Int. J. Geographical Information Systems, 7, 231 46. Heuvelink, G.B.M. and Lemmens, T. (2000) (eds), Accuracy 2000. Proceedings of the 4th International Meeting on Accuracy in Spatial Data, Amsterdam, July, Delft University Press, Delft. Karssenberg, D.J., Torqvist, T., and Bridges, J. (2001) Conditioning a process-based model of sedimentatry architecture to well data. Journal of Sedimentary Research, 71(6). Lagacherie, P., Andrieux, P., and Bouzigues, R. (1996) Fuzziness and uncertainty of soil boundaries: from reality to coding in GIS. In: P.A. Burrough and A.U. Frank (eds), Geographical Objects with Indeterminate Boundaries, Taylor and Francis, London, pp. 27586. Liverpool, T. and Edwards, S. (1995) Modeling meandering rivers. Physical Review Letters, 75, 3016. Matheron, G. (1965) La Theorie des Variables Regionalisee et ses Applications, Masson, Paris. Mitasova, H. and Hoerka, J. (1993) Interpolation by regularized spline with tension: Application to terrain modeling and surface geometry analysis. Mathematical Geology, 25, 65769. Pannatier, Y. (1996) Variowin. Software for spatial data analysis in 2D. Statistics and Computing, Springer Verlag, Berlin, 91 pp. Pebesma, E. and Wesseling, C.G. (1998) GSTAT: A program for geostatistical modeling, prediction and simulation. Computers and Geosciences, 24, 1731. Pereira, J.M.C., Carreiras, J.M.B., and Perestrello de Vasconcelos, M.J. (1998) Exploratory data analysis of the spatial distribution of wildres in Portugal 19801989. Geographical Systems, 5, 35590. Takeyama, M. and Couclelis, H.M. (1997) Map dynamics: integrating cellular automata and GIS through Geo-Algebra. International Journal of Geographical Information Science, 11, 7392.
377
Tukey, J.W. (1977) Exploratory data analysis, Addison-Wesley, Reading, Massachusets. Van Deursen, W.P.A. and Wesseling, C.G. (1995) PCRaster, Department of Physical Geography, Utrecht University. Wesseling, C.G., Karssenberg, D., Burrough, P.A., and van Deursen, W.P.A. (1996) Integrating dynamic environmental models in GIS: The development of a dynamic modeling language. Transactions in GIS 1, 408. Wise, S., Haining, R., and Ma, J. (2001) Providing spatial statistical data analysis functionality for the GIS user. The SAGE project. International Journal of Geographical Information Science, 15, 239254.
Biographical sketch
Peter A. Burrough, since 1984, is Professor of Physical Geography and Geographical Information Systems, Faculty of Geographical Sciences, University of Utrecht. Dr. Burrough is also the Director of the Utrecht center for Environment and Landscape Dynamics (UCEL). He is Chairman of the Interfaculty center for Hydrology, Utrecht (ICHU). He is a member of the advisory committee on Earth Sciences, Physical Geography and Geology for the Dutch National Science Foundation NOW, and a member of the Scientic Board for the ``Fonds voor Wetenschappelijk Onderzoek'' (FWO) for Vlaanderen, Belgium.