Você está na página 1de 11

A Procedure for Making Optimal Selection of Input Variables for Multivariate Environmental Classifications

TON H. SNELDER, KATIE L. DEY, AND JOHN R. LEATHWICK


National Institute of Water and Atmospheric Research, P. O. Box 8602, Christchurch, New Zealand National Institute of Water and Atmospheric Research, P. O. Box 11115, Hamilton, New Zealand

Abstract: Multivariate classifications of environmental factors are used as frameworks for conservation
management. Although classification performance is likely to be sensitive to choice of input variables, these choices have been subjective in most previous studies. We used the Mantel test on a limited set of sites for which biological data were available to iteratively seek a definition of environmental space (i.e., intersite distances calculated with a set of appropriately transformed and weighted environmental variables) that had maximal correlation with the same sites described in a biological space. The procedure was used to select input variables for a classification of New Zealands rivers that discriminates variation in fish communities for biodiversity management. The classification performed (i.e., discriminated biological variation) better than classifications with subjectively chosen variables. The inherently linear measures of environmental distance that underlie multivariate environmental classifications mean that they will perform best if they are defined based on variables for which there is a linear variation in the biological community throughout the entire range of the variable. Classification performance will therefore be improved when variables that have nonlinear relationships with biological variation are transformed to make their relationship with biological turnover more linear and when the contributions of environmental factors that have particularly strong relationships with biological variation are increased by weighting. Our results indicate that attention to the manner in which environmental space is defined improves the efficacy of multivariate classification and other techniques in which the environment is used as a surrogate for biological variation.

Keywords: conservation planning, classification strength, Mantel test, multivariate environmental classifications
Un Procedimiento para la Selecci n Optima de Variables para Clasificaciones Ambientales Multivariadas o

Resumen: Las clasificaciones ambientales multivariadas son utilizadas como marcos de referencia para la
gesti n de la conservaci n. Aunque el funcionamiento de la clasificaci n posiblemente es sensible a la selecci n o o o o de variables de entrada, estas selecciones han sido subjetivas en la mayora de los estudios previos. Utilizamos la prueba de Mantel en un conjunto limitado de sitios para los que haba datos biol gicos disponibles para o buscar una definici n de espacio ambiental (i.e., distancias intersitio calculadas con un conjunto de vario ables ambientales adecuadamente transformadas) que tuviera la m xima correlaci n con los mismos sitios a o descritos en un espacio biol gico. El procedimiento fue utilizado para seleccionar variables de entrada para o una clasificaci n de ros de Nueva Zelanda que discrimina la variaci n en las comunidades de peces para la o o gesti n de biodiversidad. La clasificaci n funcion (i.e., discrimin la variaci n biol gica) mejor que clasio o o o o o ficaciones con variables seleccionadas subjetivamente. Las medidas inherentemente lineales de la distancia ambiental que subyacen en las clasificaciones ambientales multivariadas significan que funcionar n mejor si a son definidas con base en variables que tienen variaci n lineal en la comunidad biol gica en todo el rango de o o

email snelder@lyon.cemagref.fr Paper submitted January 26, 2006; revised manuscript accepted September 5, 2006.

365
Conservation Biology Volume 21, No. 2, 365375 C 2007 Society for Conservation Biology DOI: 10.1111/j.1523-1739.2006.00632.x

366

Multivariate Environmental Classifications

Snelder et al.

la variable. Por lo tanto, el funcionamiento de la clasificaci n ser mejor cuando las variables que no tienen o a relaciones lineales con la variaci n biol gica sean transformadas para que su relaci n con el cambio biol gico o o o o sea m s lineal y cuando la contribuci n de los factores ambientales que tienen relaciones particularmente a o estrechas con la variaci n biol gica se incrementa mediante ponderaci n. Nuestros resultados indican que la o o o atenci n a la forma en que se define el espacio ambiental mejora la eficacia de la clasificaci n multivariada o o y otras tcnicas que utilizan al ambiente como un sustituto de la variaci n biol gica. e o o

Palabras Clave: clasificaciones ambientales multivariadas, fortaleza de clasificaci n, planificaci n de la consero o


vaci n, prueba de Mantel o

Introduction
Significant efforts have been made over the last decade to develop more systematic methods for conservation planning and assessment (e.g., Margules & Pressey 2000; Margules et al. 2002), many of which address how best to maximize conservation gains while minimizing costs (e.g., Faith & Walker 1996; Csuti et al. 1997; Possingham et al. 2000). Knowledge of the geographic distributions of species and ecosystems is a prerequisite for these methods, but in most parts of the world distributional data for many if not all taxonomic groups are sparse or lacking completely (Belbin 1993; Ferrier et al. 2002). One strategy to facilitate robust conservation management where biological information is lacking is to use environmental classifications as surrogates for information about biotic distributions (e.g., Belbin 1993; Trakhtenbrot & Kadmon 2005). In this respect environment-based classifications have similar assumptions and applications to that in which environmental diversity is used as a surrogate for species-level biodiversity (Faith & Walker 1996; Ara jo et u al. 2001). Historically, geographic regions with similar ecological and/or environmental character have been classified subjectively based on expert opinion (e.g., Olsen et al. 2001). Nevertheless, multivariate classification techniques are being used increasingly to generate such classifications (e.g., Leathwick et al. 2003; Hargrove & Hoffman 2005). In this approach the geographic domain is subdivided into spatial units (generally grid cells), each of which is characterized in terms of a suite of biologically relevant environmental factors. A classification of the spatial units is then performed, based on the environmental data, by clustering units that have similar environmental character, the assumption being that these are also likely to have similar ecological character. Although multivariate environmental classifications have been used for some time (e.g., Mackey et al. 1988), little consideration has been given to the sensitivity of classification performance (i.e., discrimination of biological variation) to the manner in which environmental distances between spatial units is measured. These distances are affected by the choice of distance measure and the environmental variables used, including any weighting and

transformations applied to them. This sensitivity has significant implications for the use of classifications in conservation management. For example, if classifications are used for reserve selection (e.g., Belbin 1993), poor performance could result in overrepresentation of some parts of the domain and under-representation of others. Whereas statistical selection and transformation of predictor variables is common in ecological analysis, in most published classifications the choice of input variables has been largely subjective (e.g., Belbin 1993; Leathwick et al. 2003; M cher et al. 2003; Hargrove & Hoffman 2005), u despite recognition that these decisions will affect classification performance (e.g., Trakhtenbrot & Kadmon 2005). Intuitively, it would seem that the contributions of variables to the classification outcome should be matched to the degree of biological turnover that occurs along their range. Nevertheless, if left unstandardized, the numeric range of variables will determine their contributions to classification outcomes, whereas contributions are equalized if the variables are standardized. Alternatively, standardized variables could be explicitly weighted to reflect the degree of species turnover occurring along each gradient, although this is complicated when variables are correlated. Similarly, use of untransformed variables implies that rates of biological turnover remain constant throughout the range of a variable. Where rates of biological turnover do vary along an environmental gradient, transformation of that variable may lead to improved classification performance. The goal of our study was to develop an objective method for choosing input variables, transformations, and weightings for environmental classifications, which optimizes the classifications performance. We used the Mantel test (Mantel 1967) and an iterative selection procedure to maximize correlation between environmental and biological distances for a limited set of sites with biological samples by varying the selection, weighting, and transformation of the environmental variables. This definition of environmental space (sensu Austin & Smith 1989) was then used to define a classification of the entire domain of interest. We applied this method to the development of an environmental classification of New Zealands rivers that aimed to discriminate variation in the composition of the freshwater fish community. We then compared the

Conservation Biology Volume 21, No. 2, April 2007

Snelder et al.

Multivariate Environmental Classifications

367

performance of this classification with alternative classifications to evaluate the degree to which classification performance is sensitive to choice of variables, transformations, and weightings. Although the spatial units we used were segments of a river network, the procedure is equally applicable to classifications based on grid data.

Methods
Study Area The study area comprised two main islands (North and South islands) and a number of smaller offshore islands that extend from latitude 34 to 47 S (Fig. 1). The area has a maritime climate with muted daily and seasonal temperature variation (Sturman & Tapper 1996). Geographic variation in rainfall reflects interaction between predominant westerly winds and the main mountain ranges, which are oriented in a southwest to northeast direction. Annual rainfall can exceed 10,000 mm at high elevation in the western South Island, but declines to 500 mm in the east of both islands. Uplifted sedimentary mountain ranges and extensive glacial and alluvial out-

wash plains are the predominant landforms in the South Island, whereas the North Island encompasses sedimentary mountain ranges and hill country, alluvial plains, and volcanic landforms. The climatic, topographic, and geological variation results in marked variation in the environmental character of rivers (Snelder & Biggs 2002). Variation in temperature is most strongly associated with elevation and more weakly with latitude and distance from the coast (Leathwick et al. 2003). The hydrology, morphology, and water chemistry of rivers predominantly in mountainous catchments contrast with streams with catchments in lowland, alluvial, outwash plains. The geographic isolation coupled with long-term environmental variability over geological time-scales has produced a unique fish fauna (McDowall 1990). Approximately 50% of New Zealands freshwater fish species are diadromous, spending part of their life cycle at sea, whereas the remaining species are more sedentary (McDowall 1990). Environmental Data The spatial units we used were taken from a GIS-based river network of New Zealand developed from a 30-m digital elevation model (DEM) (Snelder & Biggs 2002). The network contains 565,000 uniquely identified segments (the classification entities), averaging 740 m in length, that were defined by upstream and downstream confluences with tributaries. Each segment is associated with its own subcatchment, which was also derived from the DEM. All subcatchments upstream of each segment were accumulated to define the total upstream catchment. The network and subcatchments were stored as a GIS database. We identified a set of candidate variables by considering both conceptual models of environmental factors driving variation in freshwater ecosystems at a variety of spatial scales (e.g., Poff 1997), and empirical evidence for the relationship between variables and New Zealands native fish communities provided by analytical studies (Jowett & Richardson 2003; Leathwick et al. 2005). Fifteen candidate environmental variables were derived for every segment (Table 1) and were divided into those that described the character of (1) the upstream catchment of each river segment, (2) the segment itself, and (3) the downstream river network between the segment and the river mouth. The variable names are preceded by the suffixes us, seg, and ds, respectively. Details of the derivation of these variables is provided in Leathwick et al. (2005), so we provide only a brief summary here.

Figure 1. Map of New Zealand, the study region. Hill shading highlights the variation in topography, and black dots are fish-sampling sites.

UPSTREAM CATCHMENT VARIABLES

Upstream catchment variables describe aspects of the size, climate, topography, and geology of the catchment

Conservation Biology Volume 21, No. 2, April 2007

368

Multivariate Environmental Classifications

Snelder et al.

Table 1. Candidate environmental variables derived as potential input variables for a classification of New Zealand rivers that discriminates variation in fish communities.

Variable Upstream catchment usSlope ( ) usAvTCold ( C) usAvTWarm ( C) usArea (km2 ) usHard (dimensionless) usLake (dimensionless) usPhos usRainDays (days/month) Segment segAveTCold ( C) segAveTWarm ( C) segSlope ( ) segElev (m) Downstream dsAveSlope ( ) dsDistToCoast (km) dsMaxSlope ( )

Description catchment-averaged slope catchment-averaged winter air temperature catchment-averaged summer air temperature catchment area catchment-averaged hardness of underlying rocks, 0, very low, to 5, very high Lake index catchment-averaged phosphorus concentration of underlying rocks, 0, very low to 5, very high catchment-averaged days with rain 25 mm segment winter air temperature segment summer air temperature segment slope segment elevation average downstream slope distance to the coast maximum downstream slope

Mean (range) 16.6 (055) 0.6 (8.19.3) 14.3 (0.619.8) 74 (0.0220,800) 3.3 (050) 0.0 (01.0) 2.4 (05.0) 1.2 (03.3) 5.1 (6.612.7) 15.3 (2.719.9) 0.1 (02.1) 400 (02674) 0 (032.4) 99 (0450) 25 (075)

Range covered by fish data set 038 4.67.8 6.419.2 0.25900 05 00.2 0.35.0 0.23.3 0.311.3 10.919.6 00.5 31049 00.1 0420 048

of each network segment. We computed upstream catchment variables by combining the river network with various grids of environmental data. The variables were derived by summing values for each grid cell in the upstream catchment and then dividing this by the total catchment area. We used catchment area (usArea) as a measure of stream size, which affects hydraulic habitat. Three variables described climatic conditions of the upstream catchment: the average temperature in the warmest (February) (usAvTWarm) and coldest months ( July) (usAvTCold), and the average number of days per month in which 25 mm of rainfall was exceeded (usRainDays). The average upstream catchment slope (usAvSlope) was used as a measure of sediment transport and stream power, which affects hydraulic habitat. Two variables (usPhos and usHard) described variation in the physical and chemical character of catchment geology, which affects river morphology and hydrochemistry. Lake buffering of river flows and sediment regimes was represented by a variable that describes the proportion of runoff routed through lakes (usLake) (Snelder & Biggs 2002).

culated from the DEM and provided a measure of the energy required by migratory species to reach a segment. Elevation is an important variable for describing the largescale distribution of New Zealands native fish communities ( Jowett & Richardson 2003).
DOWNSTREAM VARIABLES

The character of the river network downstream of a site influences the distribution of diadromous fishes in particular because it affects their ability to migrate between a particular river segment and the sea (McDowell 1990). We used three variables to characterize the downstream river network: distance to the coast (dsDistToCoast), average downstream slope (dsAvSlope), and maximum downstream slope (dsMaxSlope). Biological Data We used fish distribution data drawn from the New Zealand Freshwater Fish Database (http://www.niwa.co. nz/services/nzffd), which currently holds fish distribution records for both native and non-native species for approximately 22,500 sites throughout New Zealand. We included all fish species because the pattern of native species is not independent of the exotic species due to predation (McDowell 1990). Records were selected for which species abundance based on electrofishing methods were available. We used the criteria defined by Jowett and Richardson (2003) to extract records for 1552 sites (Fig. 1) from the database so that the subset provided a consistent and reliable representation of the fish

SEGMENT VARIABLES

Segment variables describe aspects of the local environment of each network segment. Two variables (segAvTCold and segAvTWarm) described the average air temperature at the segment in July and February, respectively. The segment slope (segSlope), which has an important influence on local hydraulic habitat, was calculated from the DEM. The segment elevation (segElev) was also cal-

Conservation Biology Volume 21, No. 2, April 2007

Snelder et al.

Multivariate Environmental Classifications

369

community at each site. Records were selected in which the sampling method was first-pass electrofishing, the sample was taken after 1980, the area fished was measured and was either 50 m2 or 20 times the stream width, and all fish species caught were identified and counted. This data set sampled only a partial subset of the range of environments occurring in the river network and was biased to some degree with disproportionately greater sampling of smaller rivers and streams in warm, lowland environments (Table 1). We linked sites in the fish data set to the river segment in which sampling occurred based on sample location data. This allowed us to associate each sample with its relevant environmental estimates. We used the New Zealand Land Cover Database (MFE 2004) and the GISbased river network to determine the proportion of the upstream catchment categorized as urban land cover. We removed all sample sites with any urban land cover from the original data sets because this strongly affects fish distribution (Suren & Elliot 2004) and results largely in culturally, rather than environmentally, determined patterns of distribution. Environmental and Biological Distance Measures Our analysis objective was to identify a combination of environmental variables with which to calculate environmental distances for a set of test sites such that those distances would have maximum correspondence with a parallel set of biological distances for the same sites. We calculated environmental distances with the Gower metric (Gower 1971), which incorporates an inherent rangestandardization that equalizes the contributions of variables that are measured on different scales. The Gower metric is a Manhattan-type distance measure that is defined as dE = 1 n
n i=1

mation to the fish data to decrease the effect of a few very high records of maximum abundance. We then standardized abundances for each species to achieve a maximum value of one so that all species contributed equally to the final distance estimate. Because of the length of the biological gradients described by geographically extensive data sets, a large proportion of sites are likely to have no species in common. This results in many of the individual biological distances having the maximum possible value for this measure, (i.e., a value of one). We calculated corrected estimates of the biological distances for these pairs of sites with a flexible shortest path adjustment method (Death 1999). This involved estimating dissimilarities above a nominated threshold (0.9) based on sites with lower dissimilarities as stepping stones.

Selecting an Optimal Set of Environmental Variables To identify an optimal set of environmental variables, we developed a procedure in which we repeatedly tested the correlation between a matrix of all possible pairwise biological distances ( d B ) for a set of sample sites and a parallel matrix of environmental distances ( d E ) calculated with varying combinations of environmental variables and with differing transformations and weightings. The degree of correlation between the two distance matrices was measured with the Mantel r statistic (Mantel 1967), which measures the linear correlation between the two sets of distances. Our procedure was analogous to forward stepwise regression, but had the objective of maximizing the correlation between the environmental and biological matrices. At each step the procedure calculated the change in correlation that resulted from adding each of the candidate variables, either transformed or untransformed, and then selected the variable that resulted in the largest increase (or smallest decrease) in r. Because variables could be added more than once, the procedure was capable of identifying biologically important variables for which an increased weighting would improve classification performance. Variable transformations tested consisted of a series that expanded higher values by raising them to the power of 2, 3, and 4 and a series that compressed higher values by raising them to the power of 0.5, 0.33, and 0.25 or by applying a log 10 transformation. The procedure was repeated until a maximum of 20 variables, including repeat instances of variables, had been added to the set used to calculate the environmental distance matrix. Because we were concerned with the generality of the result, and its relevance to new sites in particular, we used a 10-fold cross-validation procedure (Hastie et al. 2001) to identify the justifiable number of environmental variables to use in calculating environmental distances. Sites were divided into 10 randomly selected,

|xi j xik| , range(xi )

where d E is the environmental distance between points j and k, whose attributes are described by a set of n variables x 1..n . Biological distances ( d B ) between the test sites j and k were calculated with the BrayCurtis measure of compositional distance: dB = 100
p y y i=1 | i j ik| , p yi j +yik) 1=i (

where p is the total number of species present in the data set. An important attribute of the BrayCurtis distance measure is that it is insensitive to joint absences, meaning sites are grouped on the basis of the species they have in common and species that are absent at both sites make no contribution (Clarke 1999). Prior to calculating biological distances we applied two transformations to the abundance data. First, we applied a log 10 (x + 1) transfor-

Conservation Biology Volume 21, No. 2, April 2007

370

Multivariate Environmental Classifications

Snelder et al.

equal-sized subsets. Nine of the groups were used to select the variables and the remaining group was used to provide an independent evaluation of the correlation between the environmental and biological distances. We ran this procedure 10 times, with a different subset used to evaluate r each time, for each increment in the number of variables used in the environmental distance matrix. For each increment, the Mantel r values were stored and the standard error of these values was computed. We computed the justifiable number of environmental variables as that which produced an average r within 1 SE of the maximum value of r (Hastie et al. 2001). The final choice of environmental variables and their transformations was made by running the selection procedure with the entire data set and adding only the justified number of variables.

Testing We tested the four classifications to assess their discrimination of variation in biological composition (i.e., the degree to which the classifications grouped locations with similar biological composition into the same environmental class). Given the hierarchical nature of the classifications and their ability to be used at varying levels of classification detail, we were also interested in the degree to which this discrimination varied with different levels of classification detail. The tests were performed on the same biological data set and extended biological distances used by the selection procedure. This is justifiable because our aim was not to test the classifications predictive ability (which requires an independent data set) but to assess its performance compared with classifications for which input variables were chosen less objectively. We tested the strength of the classifications at a range of hierarchical levels with analysis of similarity (ANOSIM) (Clarke & Green 1988). An ANOSIM can be used to calculate either the global average difference in compositional similarity across all classes (global R) or the average difference in compositional similarity between pairs of classes (pairwise R). The global R statistic measures the difference between biological distances of sites located in different classes and the biological distances of sites occurring within the same classes. The global R has a value of one if sites within environmental classes are more similar to each other than any sites from different classes. It is zero when there is no difference between the biological distances of sites within environmental classes and those of sites in contrasting environmental classes. The pairwise R statistic was also calculated for individual pairs of classes. The ANOSIM analysis was limited by the biological data, which meant a proportion of classes at any particular classification level had either few or no biological sites. To avoid bias, we imposed a requirement that classes had to have 10 sites to be included in the ANOSIM analysis and 10% of the classes had to contribute to the global R statistic at any classification level being tested. We assessed the global R statistic for all classes with adequate data for a range of classification levels. We also assessed the pairwise R statistic for all possible pairwise combinations of classes for the 20, 50, and 100 class levels of the classifications. For classifications other than STW, we tested the significance of the R statistic with a randomization procedure (1000 permutations) that was based on the null hypothesis of no class structure (Manly 1986). The significance of the R statistic associated with classification STW was not computed because it was produced from optimal selection and was thus invalid (Clarke & Green 1988). The classification was carried out in the multivariate statistics package PATN (Version 3, Blatant Fabrications, Australia). All other analyses, including our procedure,

Classification We constructed a matrix containing the set of variables selected by our procedure for all segments on the river network, applied the specified transformations, and used that data to define a multivariate classification. To test the comparative performance of this classification, we defined three additional classifications based on different variables, weighting, and transformation options. Classification STW was defined with the variables, transformations, and weightings selected by our procedure. Classification ST and SW were defined with the selected variables, but ST applied transformations without weighting and SW applied weighting without transformation. Classification S + M was defined with the selected variables plus an additional six that were selected as predictors by nonlinear statistical models of fish species distributions in New Zealand based on multiple adaptive regression splines (MARS) (Leathwick et al. 2005). Our choice of a classification strategy was constrained by the large amount of data (approximately 565,000 network segments), which prevented the direct use of hierarchical clustering procedures. The clustering was therefore performed in two stages. In the first stage we used a nonhierarchical clustering procedure (Belbin 1987) to define 500 groups. This iterative procedure allocated segments to clusters based on their environmental distances from each other, as measured by the Gower metric. In the second stage we used a conventional agglomerative clustering procedure (flexible unweighted pairgroup method with averages; UPGMA) to define relationships among the 500 clusters created by the initial nonhierarchical clustering. This was performed with slight space dilation ( = 0.1) to discourage chaining (Belbin et al. 1992). All network segments were assigned to clusters at every level of classification detail from 500 to 5 classes, allowing classification results to be displayed at varying levels of detail.

Conservation Biology Volume 21, No. 2, April 2007

Snelder et al.

Multivariate Environmental Classifications


Table 2. Variables used to define the four classifications of New Zealand rivers that discriminate variation in fish communities.

371

were implemented in Matlab (The MathWorks, Natick, Massachusetts) with the modifications of the statistical procedures available in the Fathom package (Jones 2003).

Classification Variable segElev usAvTWarm segAveTCold segAveTWarm dsDistToCoast usArea dsAveSlope dsMaxSlope usSlope usLake usPhos usRainDays
The

STW 1 log 10 x 1 x2 1 x0.5 1 x0.25 1 log 10 x 2 x0.25

ST 1 log 10 x 1 x2 1 x0.5 1 x0.25 1 log 10 x 1 x0.25

SW 1 1 1 1 1 2

S+M 1 1 1 1 1 1 1 1 1 1 1 1

Results
Selection of Variables, Transformations, and Weightings Our data set included 21 species, including three nonnative species. Prior to recalculating extended biological distances, 40% of intersite distances had the maximum possible value of one (indicating the sites had no species in common) and 50% of intersite distances were >0.9. After the biological distances were extended, their distribution was approximately unimodal (mean 0.85 and maximum 2.4). The maximum average Mantel r value occurred with 10 environmental variables, and a Mantel r value that was within 1 SE of this occurred with 7 variables (Fig. 2). The procedure was then used to select seven input variables that comprised segment, upstream, and downstream variables and included the variable usArea twice (Table 2). Transformations were identified as beneficial for all variables, with compressive transformations for segElev, segAveTCold, dsDistToCoast, segAveTWarm, and usArea and an expanding transformation with usAvTWarm. The three temperature variables and elevation were correlated (pairwise correlation coefficients ranged from 0.65 to 0.92).

number 1 indicates variable was used in the classification and 2 indicates variable was included twice. Transformations are noted where applicable. Classification STW was defined with the variables, transformations, and weightings selected by our procedure. Classification ST and SW were defined with the selected variables but without weighting and transformation, respectively. Classification S + M was defined with the six selected variables plus an additional six that were selected as predictors by nonlinear statistical models. The variables are defined in Table 1.

Classifications The variables, transformations, and weightings we used to define the four classifications are shown in Table 2. The mapped classifications appear as linear mosaics that delineate geographic patterns because adjacent segments tended to share the same class membership (Fig. 3). At low levels of classification detail, classes were uniform over large areas of the river network (Fig. 3, 5-class level). The patterns become patchier as classification detail was increased and influent tributaries often belonged to different classes than the main stems they joined (Fig. 3, 10-class level). Testing All ANOSIM R statistics for classifications ST, SW, and S + M were significant ( p 0.05) (Fig. 4). The R statistics increased rapidly for classification STW and ST at low levels of classification detail. The R statistic for classification STW was marginally higher than classification ST from the 5- to 80-class levels after which the R statistic for ST marginally exceeded that of STW, and gains in R with increasing classification levels reached a plateau for both classifications. Classification SW performed poorly at all levels. Classification S + M performed poorly at low levels of classification detail but exceeded the R statistic for STW at 400 classes. At lower levels of classification detail, a higher proportion of the total number of classes were tested, and less than 10% of classes in all classifications participated in the test after the 400-class level. Thus, low

Figure 2. Results of the cross-validation component of the environmental-variable selection procedure. The solid line shows the average Mantel r statistic for differing number of environmental variables. The whiskers indicate 1 SE. The circled points indicate the maximum r value and the minimum number of variables (7) at which the average r value is within 1 SE of the maximum.

Conservation Biology Volume 21, No. 2, April 2007

372

Multivariate Environmental Classifications

Snelder et al.

Figure 3. Classification STW (see caption of Table 2 for explanation of the classification) at the 5-class and 10-class (smaller region) levels of the classification. The weight of the lines demarcating the river network has been scaled by usArea (defined in Table 1); thus, rivers with larger catchments have thicker lines. The 5-class map shows river sections of order four and above, and the 10-class map shows river sections of order two and above.

levels of classification detail provided the most reliable assessment of the relative differences in performance of the classifications. Because the biological data did not cover the total environmental range, only 3560%, 2838%, and 2332% of the classes were tested at the 20-, 50-, and 100-class levels in the pairwise ANOSIM tests (Table 3). Classification STW had the highest mean pairwise R statistic for all three levels tested. At least 85% of the interclass differences were significant for classification ST. Classifications SW and S + M had the lowest R values and mean proportion of significant differences.

Discussion
Our study provides an objective procedure for selecting input variables for multivariate classifications and demonstrates conclusively that classification performance is sensitive not only to the choice of input variables, but also to their weighting and transformation. Our procedure is based on optimizing the validity of the assumption that environmental variation can be used as a surrogate for biological variation by maximizing the congruence between the locations of sites in environmental space with their corresponding locations in biological space. Our

Conservation Biology Volume 21, No. 2, April 2007

Snelder et al.

Multivariate Environmental Classifications

373

Figure 4. Results of global analysis of similarity (ANOSIM) tests on the four classifications of the river network. See footnote of Table 2 for explanation of the four classifications. results demonstrate the success of this procedure, with our tuned classification performing better than other classifications. The higher performance of classification STW, compared with the other classifications, can be understood in terms of correlation between environmental and biological distances. In these classifications the environmental distances (Gower distance measure) can be seen as surrogates for biological distances. A number of computational factors contribute to the degree to which the correlation between these two sets of distances can be increased. First, this and other potential measures of environmental distance (e.g., Euclidean distance) are based on a linear combination of the input variables. This implies that the relationship between environmental and biological distance is linear throughout the environmen-

tal domain being classified. Thus, for the classification to provide a reasonable surrogate for biological variation, a certain proportional change in environmental distance should result in the same proportional change in biological distance, regardless of the location on the environmental gradient where that change occurs. Nevertheless, rates of change in species abundance along environmental gradients are frequently nonlinear (e.g., Moisen & Frescino 2002; Olden & Jackson 2002). The use of transformation addresses this variation in species turnover rates by expanding or contracting variables over part of their range. The transformations our variable-selection procedure identified made sense from a biological perspective and were consistent with distribution models of fish species (Leathwick et al. 2005). Numerical studies also indicate that environmental predictors often differ widely in the effectiveness of their explanations of variation in the distributions of both species and communities (e.g., Moisen & Frescino 2002; Olden & Jackson 2002). This was reflected in our variable-selection procedure that weighted the variable usArea and thereby increased the separation of sites along the gradient defined by that variable. Nevertheless, weighting is complicated by correlations between variables. Our selection procedure included four strongly correlated variables (segAvTWarm, segAvTCold, usAvTWarm, and segElev). The inclusion of correlated variables in a distance measure weights the environmental variation that they describe in common. It is likely that classification ST performed similarly to our tuned classification (STW) because it differed only in the weighting of one variable (usArea) and because of the dominance of the gradient that was produced by the four correlated variables, which is common to both classifications. The similarity may also have occurred because the final SWT classification is unlikely to have preserved the optimal configuration of the sample sites perfectly for the entire geographic domain

Table 3. Results of pairwise analysis of similarity tests performed on the four classifications of New Zealand rivers that discriminate variation in fish communities.

Classification level (number of classes) 20

Classification STW ST SW S+M STW ST SW S+M STW ST SW S+M

Proportion of classes being tested 0.55 0.60 0.35 0.50 0.38 0.44 0.32 0.28 0.31 0.32 0.23 0.26

Mean proportion of significant differences (p 0.05) NR 0.85 0.76 0.64 NR 0.88 0.74 0.77 NR 0.92 0.74 0.82

Mean pairwise R value 0.35 0.33 0.29 0.23 0.40 0.34 0.31 0.22 0.40 0.37 0.35 0.29

50

100

The

significance test is not relevant (NR) for classification STW. See footnote of Table 2 for explanation of the four classifications.

Conservation Biology Volume 21, No. 2, April 2007

374

Multivariate Environmental Classifications

Snelder et al.

due to changes in density of objects in that environmental space compared with that of the sample data. Weighting that arises due to correlated input variables can affect classification performance either positively or negatively, depending on whether the overall correlation between environmental and biological distances is increased or decreased. We suggest that classification S + M performed poorly because the inclusion of additional variables decreased the relative separation of sites along the environmental gradient produced by the correlated component of segAvTWarm, segAvTCold, usAvTwarm, and seg Elev, which our selection procedure and the better performance of the other classifications indicated was correlated with biological variation. Our analysis demonstrated that classification performance tends to plateau as classification detail is increased, particularly in classifications based on a few dominant environmental variables. Resolution of biological variation is negligible after the 80-class level of our tuned classification (Fig. 4). We suggest that this occurred because environmentbiology relationships are scale dependent. In mathematical terms, as classification detail increases, the within-class environmental variation decreases. A point is reached where further subdivision of an environmental gradient will produce limited additional discrimination of biological variation as shown in the plateau at the 80-class level for classification STW (Fig. 4). Beyond this level of classification detail, a new set of environmental variables is likely to be required to continue the discrimination of biological differences. Nevertheless, the performance of such a classification at broad scales is likely to be diminished because of the inclusion of these local scale variables. Our results have important implications for the use of multivariate environmental classifications for conservation management analyses. First, our results indicate that input variables for classifications can be selected so that performance is optimal at a specific spatial scale. We suggest that achieving optimal performance by subjective choice of variables is unlikely due to the complex nature of multidimensional analyses. Furthermore, we suggest that the biological variation that can be resolved by a classification needs to be established by testing to find the level of classification at which performance plateaus. The performance plateau is the level of classification detail that should be used for conservation analyses. Using levels of classification detail that are less than the plateau could result in failing to identify some of the biological variation. The performance plateau is also the point at which a classification at a smaller scale and another (more relevant) set of variables is required to further resolve biological variation. Our results also have implications for environmental diversity analysis (ED), which uses a multivariate measure of environmental diversity as a surrogate for biodiversity (e.g., Faith & Walker 1996; Ara jo et al. 2001). Hortal and u

Lobo (2005) have discussed the importance of defining environmental diversity matrices that are correlated with biological variation at the scale of analysis. Our procedure provides a similar but alternative approach to Hortal and Lobo (2005) for selecting a combination of environmental variables that are maximally correlated with biological variation that should be applicable to ED. Nevertheless, in many other discussions of the use of environmental surrogates for conservation planning, minimal consideration has been given to optimizing the biological relevance of the mix of variables used to define environmental differences between sites. We acknowledge that our analysis had some limitations. First, the biological data set sampled only a subset of the environmental combinations occurring in New Zealand rivers (Table 1). Thus, our use of the variables, weights, and transformations derived from our selection procedure to classify the entire domain implies an extrapolation of the results of the procedure. Although use of more environmentally comprehensive data would have been desirable, the lack of such information is generally the reason for adopting an environment-based approach. Second, data that represented a range of biological components (e.g., invertebrates, fish, plants) would have been necessary if the selected variables were to reflect general biodiversity pattern. Nevertheless, biologically comprehensive data were not available. We used our selection procedure with an independent invertebrate data set and subjectively combined both sets of results to select input variables, weights, and transformations for a general classification for biodiversity management (Snelder et al. 2005). The results of this work indicated that there are environmental gradients along which a high proportion of biological variation occurs for multiple taxonomic groups. Our procedure can ensure that a classification would weight and appropriately transform those dominant gradients, whereas a less objective procedure may inadvertently dilute these and subsequently reduce classification performance. Finally, we used a limited set of parametric transformations that can only apply a monotonic modification to the shape of the biological response to an environmental variable. In addition, our approach weighted variables in a simplistic fashion, by including additional instances of a variable. Generalized dissimilarity modeling (GDM) (Ferrier et al. 2002) provides an alternative approach that has greater flexibility.

Acknowledgments
This project would not have been possible without funding by the New Zealand Ministry for the Environment and Department of Conservation for the development of the River Environment and Marine Environment Classifications. In particular we thank L. Chatterton, T. Stephens,

Conservation Biology Volume 21, No. 2, April 2007

Snelder et al.

Multivariate Environmental Classifications

375

and K. Johnston for their support over a number of years. We also thank I. Jowett, J. Richardson, U. Shankar, M. Weatherhead, and H. Hurren for their assistance and databases. We thank J. Hewitt for her review of early drafts of the manuscript and two anonymous reviewers whose comments improved our original manuscript.

Literature Cited
Ara jo, M. B., C. J. Humphries, P. J. Densham, R. Lampinen, W. J. M. u Hagemeijer, A. J. Mitchell-Jones, and J. P. Gasc. 2001. Would environmental diversity be a good surrogate for species diversity? Ecography 24:103110. Austin, M. P., and T. M. Smith. 1989. A new model for the continuum concept. Vegetatio 83:3547. Belbin, L. 1987. The use of non-hierarchical allocation methods for clustering large sets of data. Australian Computer Journal 19:3241. Belbin, L. 1993. Environmental representativeness: regional partitioning and reserve selection. Biological Conservation 66:223230. Belbin, L., D. P. Faith, and G. W. Milligan. 1992. A comparison of two approaches to -flexible clustering. Multivariate Behavioural Research 27:417433. Clarke, K. R. 1999. Nonmetric multivariate analysis in community-level ecotoxicology. Environmental Toxicology and Chemistry 18:118 127. Clarke, K. R., and R. H. Green. 1988. Statistical design and analysis for a biological effects study. Marine EcologyProgress Series 46:213 226. Csuti, B., et al. 1997. A comparison of reserve selection algorithms using data on terrestrial vertebrates in Oregon. Biological Conservation 80:8397. Death, G. 1999. Extended dissimilarity: a method of robust estimation of ecological distances from high beta diversity data. Plant Ecology 144:191199. Faith, D. P., and P. A. Walker. 1996. Environmental diversity: on the bestpossible use of surrogate data for assessing the relative biodiversity of sets of areas. Biodiversity and Conservation 5:399415. Ferrier, S., M. Drielsma, G. Manion, and G. Watson. 2002. Extended statistical approaches to modeling spatial pattern in biodiversity in northeast New South Wales.II. Community-level modeling. Biodiversity and Conservation 11:23092338. Gower, J. C. 1971. A general coefficient of similarity and some of its properties. Biometrics 27:857871. Hargrove, W. W., and F. M. Hoffman. 2005. Potential of multivariate quantitative methods for delineation and visualization of ecoregions. Environmental Management 34(Suppl 1):S39S60. Hastie, T., R. Tibshirani, and J. H. Friedman 2001. The elements of statistical learning: data mining, inference, and prediction. Springer-Verlag, New York. Hortal, J., and J. M. Lobo. 2005. An ED-based protocol for optimal sampling of biodiversity. Biodiversity and Conservation 14:29132947. Jones, D. 2003. FATHOM: a Matlab toolbox for ecological and oceanographic data analysis. Rosenstiel School of Marine & Atmospheric Science, Department of Marine Biology & Fisheries, University of Miami, Miami. Jowett, I. G., and J. Richardson. 2003. Fish communities in New Zealand rivers and their relationship to environmental variables. New Zealand Journal of Marine and Freshwater Research 37:347 366.

Leathwick, J. R., J. M. Overton, and M. McLeod. 2003. An environmental domain analysis of New Zealand, and its application to biodiversity conservation. Conservation Biology 17:16121623. Leathwick, J. R., D. Rowe, J. Richardson, J. Elith, and T. Hastie. 2005. Predicting the distributions of New Zealands freshwater diadromous fish. Freshwater Biology 50:20342052. Mackey, B. G., H. A. Nix, M. F. Hutchinson, J. P. McMahon, and P. M. Fleming. 1988. Assessing representativeness of places for conservation reservation and heritage listing. Environmental Management 12:501514. Manly, B. F. J. 1986. Randomization and regression methods for testing for associations with geographical, environmental and biological distances between populations. Researches on Population Ecology 28:201218. Mantel, N. 1967. The detection of disease clustering and a generalized regression approach. Cancer Research 27:209220. Margules, C. R., and R. L. Pressey. 2000. Systematic conservation planning. Nature 405:243253. Margules, C., R. L. Pressey, and P. H. Williams. 2002. Representing biodiversity: data and procedures for identifying priority areas for conservation. Journal of Biosciences 27:309326. McDowall, R. M. 1990. New Zealand freshwater fishes. Heinnman Reid, Wellington. M cher, C. A., R. G. H. Bunce, R. H. G. Jongman, J. A. Klijn, A. J. M. u Koomen, M. J. Metzger, and D. M. Wascher. 2003. Identification and characterization of environments and landscapes in Europe. Alterra, Wageningen, The Netherlands. New Zealand MFE (Ministry for the Environment). 2004. New Zealand land cover database 2 user guide. Ministry for the Environment, Wellington, New Zealand. Moisen, G. G., and T. T. Frescino. 2002. Comparing five modeling techniques for predicting forest characteristics. Ecological Modeling 157:209225. Olden, J. D., and D. A. Jackson. 2002. A comparison of statistical approaches for modeling fish species distributions. Freshwater Biology 47:19761995. Olsen, D. M., et al. 2001. Terrestrial ecoregions of the world: a new map of life on earth. BioScience 54:933938. Poff, N. L. 1997. Landscape filters and species traits: towards mechanistic understanding and prediction in stream ecology. Journal of the North American Benthological Society 16:391409. Possingham, H. P., I. Ball, and S. Andelman. 2000. Mathematical methods for identifying representative reserve networks. Page 291306 in S. Ferson and M. Burgman, editors. Quantitative methods for conservation biology. Springer-Verlag, New York. Snelder, T. H., and B. J. F. Biggs. 2002. Multi-scale river environment classification for water resources management. Journal of the American Water Resources Association 38:12251240. Snelder, T. H., J. R. L. Leathwick, and K. L. Dey. 2005. Definition of the multivariate environmental river classificationFreshwater Environments of New Zealand. NIWA report: CHC2005-049. Department of Conservation, Christchurch, New Zealand. Sturman, A. P., and N. J. Tapper 1996. The weather and climate of Australia and New Zealand. Oxford University Press, Melbourne. Suren, A., and S. Elliot. 2004. Impacts of urbanization on streams. Pages 35.3135.17 in J. S. Harding, P. Mosley, C. Pearson, and B. Sorell, editors. Freshwaters of New Zealand. New Zealand Hydrological and Limnological Societies, Christchurch, New Zealand. Trakhtenbrot, A., and R. Kadmon. 2005. Environmental cluster analysis as a tool for selecting complementary networks of conservation sites. Ecological Applications 15:335345.

Conservation Biology Volume 21, No. 2, April 2007

Você também pode gostar