Jenks - Algoritimos Geneticos PDF

Using Genetic Algorithms to Create Multicriteria
Class Intervals for Choropleth Maps

Marc P. Armstrong*,**, Ningchuan Xiao*, and David A. Bennett*
*Department of Geography, The University of Iowa
**Program in Applied Mathematical and Computational Sciences, The University of Iowa
During the past three decades a large body of research has investigated the problem of specifying class intervals for
choropleth maps. This work, however, has focused almost exclusively on placing observations in quasi-continuous
data distributions into ordinal bins along the number line. All enumeration units that fall into each bin are then
assigned an areal symbol that is used to create the choropleth map. The geographical characteristics of the data are
only indirectly considered by such approaches to classification. In this article, we design, implement, and evaluate a
new approach to classification that places class-interval selection into a multicriteria framework. In this framework,
we consider not only numberline relationships, but also the area covered by each class, the fragmentation of the
resulting classifications, and the degree to which they are spatially autocorrelated. This task is accomplished through
the use of a genetic algorithm that creates optimal classifications with respect to multiple criteria. These results can
be evaluated and a selection of one or more classifications can be made based on the goals of the cartographer. An
interactive software tool to support classification decisions is also designed and described. Key Words: choropleth, class
intervals, genetic algorithms.
ven a casual observer would recognize that a

revolution in access to mapping technologies and
geographic information has taken place during the
past decade. Inexpensive microcomputers and software
are now routinely used with Internet-available data
resources to offer map production capabilities that, in
the past, were the exclusive provenance of national
governments (Buisseret 1992; Kain and Baigent 1992)
and, more recently, were available only to large government agencies, map-production companies, and a few
advanced research laboratories. These changes, hinted at
in Monmoniers (1985) examination of the effects of
technology on cartography, have engendered new ways
of thinking about maps. One particularly important
change has been a transition toward end-user manipulation of digital spatial data (Morrison 1993) and from
supply-driven to demand-driven cartography (Kraak
1998). Users of route-planning Websites, for example,
often change map scale either because they need greater
detail around their destination (such as residential street
names) or because they would like to place their current
map in a broader geographical context. The consequence
of these changes has been to enlarge enormously the pool
of people who make maps.
This shift toward cartographic catholicity can be
examined using a conceptual model of map use (Figure 1)
that MacEachren and others have developed (see, for
example, MacEachren and Kraak 2001, 5). The model
has three axes: level of interaction (low-high), map audi-
ence (public-private), and data relationships (knownunknown). For example, if a map were prepared for a
public audience to show a known set of data relationships,
the level of interaction might be low. On the other hand,
for private uses, and especially when data have unknown
relationships, the expected amount of interaction required to support data exploration is considerably greater.
Monmonier, however, has repeatedly argued that cartographers should not continue to insist that there is a single
map to present (this is a major premise of How to Lie
with Maps [1996]; see also Egbert and Slocum 1992;
Monmonier 1992; MacEachren 1995). One interpretation of this position is that even in the presentational
(public, known, low-interaction) parts of the conceptual
space shown in Figure 1, additional support for user
interactionor at least mapped alternativeswould help
map-readers gain additional information about the geographical phenomena of interest to them.
Though technology is now able to support increased
levels of user interactionespecially given the current
proliferation of Web-based map servers (Kraak and Brown
2001)problems remain with implementation in the
realm of statistical cartography. Choropleth maps are
perhaps the most commonly created type of statistical
map, even though it has long been recognized that they
have significant liabilities when used to communicate
geographically distributed information (Raisz 1948, 249).
Nevertheless, choropleth maps are widely used in
exploratory spatial-data analysis (see Andrienko and
Annals of the Association of American Geographers, 93(3), 2003, pp. 595623

r 2003 by Association of American Geographers
Published by Blackwell Publishing, 350 Main Street, Malden, MA 02148, and 9600 Garsington Road, Oxford OX4 2DQ, U.K.
596
Armstrong, Xiao, and Bennett

present
public
synthesize
audience
analyze
data
relations
kno
explore
private
n
now high interaction
unk
wn
low
Figure 1. Three dimensions of map use. Source: After MacEachren

and Kraak (2001).
Andrienko 1999; Anselin 1999), because most socioeconomic statistical information is tabulated for predefined jurisdictions. Choropleth maps also serve as the
cornerstone of Internet-based statistical map services and
are produced prodigiously using the current generation of
GIS software. The U.S. Bureau of the Census, for example,
has a Web browser interface that allows users to create
thematic maps of statistical information collected as part
of the decennial census. However, a map-readers ability
to understand a choropleth map is shaped by several
interacting factors, such as the shading scheme, the
number of classes, and classification method used (or not)
to generalize the data. Each of these factors can be
manipulated to yield large collections of maps created
from the same statistical information.
Classification, by itself, has been the focus of a
considerable amount of cartographic research. Most of
this work has examined classification on the basis of the
tabular statistical properties of the data. In stark contrast,
only a few studies have considered the geographical
characteristics of data distributions during the classification process. These geographical considerations would
include, for example, attempts to place contiguous units
into a single class to promote the visual assessment of
homogenous regions. The purpose of this article is to
develop a general approach that can be used to help
cartographers bridge the existing gap between the tabular
and geographical dimensions of choropleth class-interval
selection and to elucidate the wide range of classification
options that are open to them. Our approach, derived
from the field of evolutionary computation, uses a genetic algorithm to create a distribution of solutions that
satisfy alternative classification criteria. The spirit of this
approach derives from work in interactive scientific
visualization and spatial decision support systems and
is consistent with the view that there is no single true
map. Instead, when media allow it, the cartographer is

presented with classification alternatives based not only
on conventional manipulations of tabular statistical data,
but also on less frequently performed manipulations of the
geographical relationships that exist among enumeration
units. This enlarged set of geographically based options
allows cartographers to select classification options
from those that best meet their statistical and geographical criteria.
Following a review of previous research on classinterval selection, we reestablish that it is a multicriteria
decision problem, integrate concepts from spatial statistics
into the class-interval selection process, and discuss how a
genetic algorithm can be used to help cartographers
remain involved in shaping the geographical and statistical characteristics of the map. We describe how our
genetic algorithm was designed and implemented
and demonstrate its effectiveness, first by applying the
algorithm to six synthetic gridded datasets that were
developed to illustrate the conceptual basis of our
approach and then by using three geographical datasets
meant to reflect a range of conditions typically encountered in geographic research and education. We
conclude with a summary of results, a discussion about
the role of evolutionary algorithms in classification, some
comments on strategies for enhancing performance, and
some suggestions about how our approach could be
extended to include other criteria not explicitly considered in this article.
Background and Problem Elaboration

Classification matters. It serves as a filter through
which we attempt to make sense of a complex world.
Though Bowker and Star (1999) focus on categorical
(nominal) data, the power of their argument about how
classification privileges some things and renders others
invisible is easily sustained in cartographic contexts. In
particular, it is well known that the way in which data are
transformed from interval/ratio scaled tabular observations to ordinally based areal symbols can enhance or
attenuate the communication of specific types of statistical information on choropleth maps (Cuff and
Mattson 1982, 3739; Robinson et al. 1995, 51619;
Dent 1999, 13956; Slocum 1999, 6082). Consequently,
class-interval selection is arguably the most important
problem faced by cartographers when they contemplate
the construction of a choropleth map ( Jenks and Caspall
1971, 220). Though considerable discussion has occurred
about the methods and merits of classification in the
cartographic literature (e.g., Jenks and Coulson 1963;
Using Genetic Algorithms to Create Multicriteria Class Intervals for Choropleth Maps
Peterson 1979; MacEachren 1982), many of the results
reported in the literature, unfortunately, are either
contradictory or difficult to compare directly. Tobler
(1973) has argued persuasively that classification is
unnecessary, and, depending on the purpose of the map,
there is evidence that his view can be substantiated
(Muller 1979). However, many cartographers feel that the
unclassed approach Tobler advocates leads to cognitive
confusion (Dobson 1973, 1980). They assert that data
classification reduces map-processing times (Gilmartin
and Shelton 1989) and increases accuracy of data
recovery across a variety of map-reading tasks (Mersey
1990). Moreover, it is certainly true that almost all
desktop mapping and GIS software supports the creation of choropleth maps with classed data. Consequently, classification remains a viable topic for
further investigation.
Textbook approaches to choropleth class-interval
selection focus almost exclusively on the statistical
properties of tabular thematic information (Table 1) and
effectively divorce the statistical distribution from its
geographic context. In practice, choroplethic classification pedagogy has encouraged students and practitioners
to use, among other approaches, round numbers,
exogenously determined values as breakpoints (e.g., zero,
if relevant, or poverty level on an income map), or to select
from among equal intervals, quantiles, natural breaks,
mathematical progressions, limits derived from measures
of central tendency and dispersion, and, more recently,
optimal approaches. Though Jenks and Caspall (1971,
225) single out the nested-means method advocated by
Armstrong (1969) as the most extreme of the statistically
oriented approaches, their optimal method stands, in fact,
as the exemplar of statistical approaches. The objective of
their optimal method, as described in greater detail by
Jenks (1977), is to develop an ordinal classification, based
entirely on the position of observations along the number
line, in which within-class variance is minimized. Alternative approaches use absolute deviation from the
597
mean (Jenks 1977) and median (Lindberg 1990; Slocum

1999) to assess the level of within-class variability
(Declerq 1995).
This statistically oriented classification orthodoxy has
emerged for several reasons. First, statistical methods
are well developed, emerge from an established body of
knowledge, andwith few exceptionscan be easily
explained. Second, despite advances in GIS software,
methods for handling geographical relationships remain
difficult to implement; this continues to present a practical
barrier to improvements in class-interval selection. Third,
existing map-creation software builds upon the first two
reasons to effectively maintain the status quo: it classifies
data in the tabular statistical domain and inertially
reinforces this approach to the exclusion of others,
presenting it as a fait accompli. In almost all cases,
geographical orderings are only indirectly manifested in
the form of maps that emerge as a consequence of
statistical manipulations imposed by a specific classinterval selection method. In our view, this tabular bias
not only may excessively constrain the search for solutions
to the class-interval selection problem but may also, in
fact, blinker the views or needs of the cartographer. This
position is not ours alone, of course, as we discuss in the
next section. But our placement of the problem in a
multicriteria framework enables cartographers to more
fully investigate alternative classification objectives that
remain outside of the traditional focus on the statistical
properties of tabular information.
Reintroducing Choropleth Class

Intervals to Geography
There are a few notable exceptions to the exclusive
focus on the statistical domain in choropleth classification. MacEachren (1995, 166), in particular, questions the
choropleth classification orthodoxy in the following way:
Why, then, do we ignore space when classifying data
for a choropleth map? Jenkss statistical classification
Table 1. Coverage of Classification Methods in Five Recently (Post-1990) Published Cartography Textbooks
Textbook
Tyner (1992)
Robinson et al. (1995)a
Kraak and Ormeling (1996)
Dent (1999)
Slocum (1999)
Tabular Characteristics
Jenks Optimal
Geographical Characteristics
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
No
No
No
No
Yes
Note: Only one author discusses geographically informed approaches, while four of the five texts mention or describe the optimal approach.
a
The previous edition of this title was published in 1984 and contains a more elaborate discussion of both the optimal and ideas related to the consideration of
geographical characteristics: it discusses a geographical quantiles (equalization of area in each class) approach.
598
procedures seem to have (if only coincidentally) taken

into account the principle of basic-level categories. The
next step in our evolution of quantitative data classification, then, should be to add space to the equation. We
agree. This view is also supported by research conducted
more than two decades ago by Chang (1978), who found
that map-readers tended to prefer choropleth maps that
had simpler (more organized, less fragmented) patterns
(see also Olson 1975).
As we suggest above, other researchers have not
completely ignored the idea of using geographical characteristics to shape the process of forming classes on
choropleth maps. In particular, early work by Monmonier
(1972, 1973) advocated the use of optimization models
and grouping algorithms to yield class intervals that
explicitly considered the clumping of classes in geographical space. Monmoniers (1972) grouping approach
occurs in a series of transformative stages and allows
the user to adjust the relative weighting of variables in the
grouping procedure. It also disallows overlapping classes.
Monmoniers later (1973) optimum-seeking approach
does not specifically address contiguity, but this constraint
could be added through an appropriate set of data
transformations (see Armstrong et al. 1991 and Lolonis
and Armstrong 1993 for examples). Evans (1977, 103),
however, suggests that Monmoniers results give very
little improvement in contiguity, at the expense of
considerable declines in intra-class homogeneity. Mak
and Coulson (1991, 110) go so far as to suggest that the use
of an optimal classification in the tabular domain will, in
effect, take care of the geographical domain. Coulson
(1987, 25) articulates this argument even more forcefully
when he suggests that class intervals generated by the
Jenks Optimal Method have the incidental effect of
resolving concerns about spatial contiguity. When within
class variance is minimized, spatially adjacent units with
similar data values are almost certain to fall in the same
class. If they do not, then they are probably not that
similar. This position, of course, is valid only if the data
are highly (positively) spatially autocorrelated.
Despite such skepticism, however, MacDougall (1992),
Cromley (1996), Cromley and Mrozinski (1999), and
Murray and Shyy (2000) have picked up the cold trail of
geographical arrangement and describe approaches in
which contiguity is used as a factor to determine class
breaks on choropleth maps. Cromley (1996) suggests a
way to optimize Jenks and Caspalls (1971) notion of
boundary error, while Cromley and Mrozinski (1999)
focus on maps created from ordinal data. Such data
present a classification problem, since they lack a
theoretically based statistical distribution, and, as a
consequence, methods based on commonly accepted
statistical assumptions cannot be legitimately applied.

However, most data mapped using the choropleth method
are not ordinal in their original form. Murray and Shyy
(2000) examine the choropleth class-interval selection
problem and place it into the context of spatial data
mining. In their approach, clustering methods are used to
tease out relationships from large amounts of data for
which no relationships are posited a priori. Though their
approach is interesting and related, in spirit, to the one
advanced in this article, it has three main limitations: (1)
Euclidean distance between polygon centroids is used, for
the sake of convenience, as a measure of proximity, though
for areal data contiguity relations are better suited to
support region-building; (2) the resulting classes overlap,
which is inappropriate for choropleth maps ( Jenks and
Coulson 1963, 120); and (3) they make an underlying,
implicit assumption that the solution space has known
characteristics (e.g., that it is convex), so that linear
programming methods can be employed.
In summary, our approach is inspired by the original
work of Jenks and Caspall (1971), extends notions about
the role of contiguity described by Monmonier (1972),
handles ordinal-, interval-, and ratio-scale data distributions (Cromley and Mrozinski 1999) and is exploratory in
the vein of the work of Murray and Shyy (2000). As we will
demonstrate below, our approach also explicitly considers
contiguity, does not yield overlapping classes, and makes
no assumptions about the form of the solution space.
Choropleth Class Interval Selection

as a Multicriteria Problem
Cromley (1996) compared several classification methods (in addition to boundary error, mentioned above) and
was the first to suggest that the choropleth classification
problem could be specified explicitly as a multicriteria
problem. Subsequently, Slocum (1999) has documented
the challenges that cartographers face when they must
create categories for choropleth maps. He devotes an
entire chapter to a discussion of various approaches to
classification and suggests that the choice of a classification method should be made on the basis of several
criteria. Though these criteria fall into the tabular
statistical domain, the last section of his work reintroduces
the often-ignored work on geographical criteria initiated
by Jenks and Caspall. It is interesting to note, however, that while Slocum (1999) covers the concepts,
he provides only a sketchy discussion about how to
make them operational. This is understandable, since the
supporting research on which his and other textbooks are
rightly based is inadequately developed.
To facilitate further discussion about multiple factors
and classification criteria, the following set of definitions are used throughout the remainder of this article.
A cluster is defined as a set of contiguous units that are
assigned to the same class (i.e., polygons or grid cells,
depending on the representation used by the map). The
other definitional nomenclature used is as follows:
8 8 5 an operator that gives the number of elements
in a set or array
a 5 a chain that contains a pair of left and right units, (r, l)
ai 5 the area of the i-th observation
Ai 5 total area of class i for two classes i and j; if i4j,
then AjZAi
Aij 5 the area of polygon i in class j
A 5 total area of all classes
C 5 a set consisting of all pairs of contiguous clusters
G 5 an array of all possible chains with the sequence of
its elements
sorted
so that for any
two chains i and i11,
we have zi zi zi1 zi1
r
H 5 a set consisting of all chains where, for each

chain, the left and right units are not in the same
class
k 5 number of classes
M 5 number of clusters
N 5 number of observations
Nj 5 number of observations within class j, 1rjrk
xi 5 mean value of the observations within cluster i,
1rirM
x 5 mean values of all clusters
zi 5 i-th observation
zij 5 i-th observation in class j, 1rjrk
zj 5 mean of class j, 1rjrk
z 5 mean of all observations
zar 5 observation to the right of chain a
zal 5 observation to the left of chain a
Figure 2 illustrates these terms, showing values for five
polygons placed into two classes, one of which forms two
discontiguous clusters.
A
D
C
B
E
Label
ID
A
B
C
D
E
1
2
3
4
5
Polygon
Observation
(Zi)
0.55
0.70
1.80
2.50
0.05
1.12
Area
(ai)
12
15
25
21
13
86
Index
1
1
2
2
1
Class
Area
(Ai)
40
46
86
Value
(Zij)
0.43
2.15
-
Index
1
1
2
2
3
Cluster
Area
(Ai)
27
46
13
86
a = (1,2), (1,3), (2,3), (3,4), (3,5), or (4,5)

a i = 12, 15, 25, 21, or 13
N =5
N j = N1 = 3, N2 = 2
Ai = 40, or 46
A = 86
xi = x1 = 0.625, x2 = 2.15, x3 = 0.05

x = 0.94
zi = (see the above table)
zij = (see the above table)
C = {(1,2), (2,3)}
C =2
G = {(4,5), (3,4), (3,5), (1,3), (2,3), (1,2)}
G =6
H = {(1,3), (2,3), (3,5), (4,5)}
H =4
k =2
m =3
Figure 2. An example of the variables used in the calculations.
599
z i = (see the above table)

z = 1.12
z ra = 0.55 for a = (1,3)
zla = 1.8 for a = (1,3)
Value
(xi)
0.625
2.15
0.05
600
Jenks and Caspall (1971) suggest that there are three

purposes for which a choropleth map can be used: (1) to
gain an overview of a statistical distribution; (2) to form an
areal table from which specific values for areas can be
determined; and (3) to serve as a way of determining
borders between regions with a common or similar
shading. They go on to define three types of errors
(described in greater detail below) that fit into these
domains. In their overview, they (1971, 231) see tabular,
overview and boundary errors as three opposing forces
that form a space defined by orthogonal axes. They
espouse the idea of seeking a compromise among these
factors, represented as a vector in this 3-D space. Jenks
and Caspall (1971), however, acknowledge that these
factors are not, in fact, orthogonal, but they leave
unresolved the complexity of combining them in an
appropriate way.
To make their concept of error on choropleth maps
operational, Jenks and Caspall (1971) treat each error
independently and define three measures that can be
optimized. However, they (1971, 232) liken the search
for an optimal solution in these three resulting dimensions to searching for a needle in a haystack. The size of
the metaphorical haystackin this case, the number
of possible classificationsincreases rapidly as a function
of the number of classes and number of units to be
classified. In fact, the class-interval problem, in which N
data values are grouped into k classes, can yield (N 1)!/
(N k)!(k 1)! different groups (see Fisher 1958; Jenks
and Caspall 1971; Cromley and Mrozinski 1999). In terms
of asymptotic computational complexity (Big O notation), brute-force choropleth classification is O(Nk 1)
(for N @ k). However, Fisher (1958) has suggested a
classification algorithm that is a considerably simpler
O(N2k) (see Hartigan 1975).
Because of the combinatorial complexity of the
problem, Jenks and Caspall (1971) assert that a complete
enumeration of the solution space is infeasible and suggest
ways to search for good solutions using their three error
measures. The first measure that can be optimized is
called the tabular accuracy index (TAI), and they develop a heuristic algorithm to maximize this measure of
within-class homogeneity.
Nj
k P
P
TAI 1

zij zj
j1 i1
N
P
jzi zj
i1
This general approach is often described as Jenks

optimal in the choropleth-mapping literature.
In a later article, Jenks (1977) more explicitly elucidates this approach by providing computer code. The TAI
has also been made operational in modified form (see, for
example, Dent 1999, 148) through the use of the goodness
of variance fit (GVF) measure:
Nj
k P
P
GVF 1
zij zj 2
j1 i1
N
P
zi z2
i1
Though, as shown in Table 1, the TAI approach is widely

documented in several cartography textbooks (cf. Smith
1986, 67), the remaining two measures described
by Jenks and Caspall, overview accuracy index (OAI)
and boundary accuracy index (BAI), are not usually
discussed. Moreover, they are not optimized as part
of a class-interval selection process by any choropleth map production software that we were able to
identify. The OAI takes the area of each zone into
consideration:
Nj
k P

P
zij zj Aij
OAI 1
j1 i1
N
P
jzi zjai
i1
In effect, the OAI is an area-weighted TAI. If all zones are

roughly identical in size, it will yield results that are similar
to the TAI. This is the case in the Illinois example
described by Jenks and Caspall (1971). Consequently,
we did not include this measure in the analyses reported below.
Even though OAI controls for area, it fails to consider geographical context, in the form of contiguity
maintenance. Jenks and Caspall (1971, 238), therefore, next turn their attention to the BAI, which considers the values of topological neighbors, and state that
the geographical manipulations needed to optimize it
are complex:
Since boundary differences are not related to the intensity
value being moved but are controlled by the relationship of
that value to all neighboring intensity values, the problem
of identifying the most promising enumeration unit maneuver is not readily soluble. Unable to resolve this grouping
dilemma by using the reiterative and forcing technique, we
have temporized by attempting to use alternative procedures.
(emphasis added)
One approach that Jenks and Caspall suggest as

a way to address the BAI problem is based on manual
manipulation of geographical relationships. They discard
this approach, however, because it performed poorly.
The alternative choice is to return to the tabular domain
and use, once again, grouping methods to yield regions
that are most similar. This part of the procedure, however,
is rather opaque in the original article. Consequently, we
were required to recreate the specific approach used.
In our view, the efforts of Jenks and Caspall were
frustrated by the technological and conceptual inadequacies of their times, since computing resources were then
quite limited and knowledge about spatial data structuresespecially those that explicitly encode topological
relations (Peucker and Chrisman 1975)was not yet
widely disseminated. We assume, therefore, that they were
not (directly) manipulating the topological structure of
the enumeration units. A decade later, MacEachren
(1982) described measures of map complexity that could,
in an iterative fashion, be optimized to attack the BAI
problem. MacEachren represented the topological structure of enumeration units as a type of graph: each
polygonal unit is a face, each shared boundary is an edge,
and each node is a vertex in the graph (see also Declerq
1995; Cromley 1996; Slocum 1999, 57). In a choroplethic
representation, when contiguous polygons are placed into
the same class, the common edge and vertices are omitted.
MacEachrens measures were derived by enumerating
the graph for the original (unclassified) base map and the
graph that results after classification. In particular, we
have adopted the measure called CF, which is the ratio of
the observed postclassification number of faces divided by
the number of faces in the original graph.
Based on this reasoning we have adopted the following
definition of BAI, which was not defined formally in Jenks
and Caspall (1971):
Hk
kP
BAI

P i
zr zil
i1 r;l2H
kP
Hk

P i
zr zil
i1 r;l2G
The numerator is the sum of the absolute differences

between observations separated by class boundaries, and
the denominator is the sum of largest kHk absolute
differences across all possible boundaries. Note that,
in equation (4), G is sorted in descending order, so
the sum of its first kHk elements will give the sum of the
greatest breaks.
In addition to the measurements suggested by Jenks
and Caspall (1971), several other, possibly conflicting
criteria can be used to control the efficacy of a choropleth
map. One of these criteria would equalize the visual
weight assigned to each class, and such equal-area
classification methods (also referred to as geographic
601
quantile; see Robinson et al. 1984, 354) have been

implemented in several commercial GIS packages. In this
article, to ensure the objective that the areas of all classes
are approximately equal, we optimize a Gini coefficient
(e.g., Smith 1977) that measures the areal inequality
among classes:

i 1 1 X

i

GEA max

Aj ; i 1; . . . ; k: 5
k
A j1
Note that the vector of areas A1 . . . Ak is sorted in
ascending order.
The spatial structure of the symbols applied to
enumeration units represents an additional dimension
to the look of choropleth maps. Olson (1975) developed
an approach to characterizing this structure through the
use of an ordinal-level measure of spatial autocorrelation.
Though the study of spatial statistics has advanced
considerably since then (Goodchild 1988; Odland 1988;
Getis and Ord 1992; Anselin 1995; Griffith 2000), and
measures of spatial autocorrelation are both better known
and more widely applied, a problem remains: substantially
different patterns can yield values of a spatial autocorrelation coefficient that are not significantly different. Nevertheless, spatial autocorrelation coefficients do measure
the degree to which similar values are close (or contiguous), and this is an appropriate way to formalize an
objective that seeks to form aggregated regions containing
members of the same class. In this research, we use a
slightly reformulated Morans I statistic, which we refer to
as Morans cluster coefficient (MIC), where a cluster is a
flexibly defined set of contiguous polygons that belong to
a single class:
P
M
xi xxj x
MIC
i;j2C
kCk
M
P
:
xi x
i1
Though Morans I is theoretically unconstrained (Goodchild 1988, 30; Bailey and Gatrell 1995, 270; OSullivan
and Unwin 2003, 200), results in this research were found
in the range of [ 1, 1].
To summarize, the four objectives to be minimized in
this article are defined as follows:
min EVF 1 GVF
min GEA
min MIC1 1 MIC
min BE 1 BAI;
602
where EVF is error of variance fit, GEA is geographical

area equalization, MIC1 refers to the opposite sense of
spatial autocorrelation, and BE is boundary error. Note
that the range for EVF, GEA,and BE is from 0 to 1, while
the range for MIC1 is from 0 to 2 (after transformation).
Each of these objectives is minimized to maintain
computational consistency and to enable results to be
compared more easily. By including variance minimization, boundary conditions, area equivalency, and spatial
structure as objectives, we have specified a problem with
four potentially conflicting criteria that can be traded off
depending on the wishes of the cartographer. For example, in one dimension of the problem, the cartographer
may wish to adhere strictly to the traditional concept
of statistical optimization (a` la Jenks) and attempt to
maximize within-class homogeneity. This objective, however, excludes the geographical dimension of the map, in
which the cartographer may wish to maximize a goal of
regional simplification by ensuring that contiguous units
are, to the greatest extent possible, included in the same
class. Pursuit of this objective, however, might yield a
classification in which within-class variance is not minimized. Another objective, one that competes with the
previous two, is to attempt to equalize the total area of
the polygons included in each class. However, an equalarea classification does not guarantee the spatial
contiguity of clusters of polygons on the map. It is clear
that the satisfaction of multiple classification criteria is a
complex problem. In the following section, we describe an
approach to generating solutions to it.
Methods
Jenks and Caspall (1971), Declerq (1995), Cromley
(1996), Slocum (1999), and Brewer (2001) agree in suggesting that the class-interval selection problem involves
the satisfaction of multiple criteria. As noted above,
however, the number of classifications that may need to be
examined grows rapidly as a function of problem size and
number of classes required. Furthermore, as additional
criteria are introduced, the complexity of the problem
increases rapidly. Consequently, for large problems, a
complete enumeration of the set of possible solutions
may be infeasible. Fortunately, several types of optimization methods have been developed to address
such problems.
There are two general classes of optimization methods
(Sait and Youssef 1999). Members of the first type, exact,
seek the best solution to a problem either by enumerating
a search space and restricting the search by using different
approaches that attempt to prune sequences of deci-
sions that will not yield a correct result (e.g., branch

and bound), or, in the case of linear programming, by
evaluating vertices, created by systems of inequalities, that
form a multidimensional convex polytope. For example,
Murray and Shyy (2000) all adopted an exact-solution
approach to class-interval selection.
In many cases, however, exact approaches are inadequate. They are often inefficient when applied to large,
realistic problems; in other cases, the problems do not
meet the required linearity assumptions. Because of these
limitations, researchers have developed several computerbased methods that will yield solutions to problems that
cannot be addressed adequately by linear or enumerative
approaches. This class of combinatorial optimization
methods, referred to as approximation methods, uses
principles based on heuristic search. Heuristics are used
to limit search to a subset of the problem, and even though
they do not guarantee a global optimum solution, they
normally yield very good solutions to large problems (Sait
and Youssef 1999; see also Monmonier 1973).
The heuristic approximation approach that we have
adopted is called a genetic algorithm (GA). Holland
(1975, 1986, 1998) developed GAs as an adaptive
approach to solving computationally complex problems
in a way that is loosely based on the concept of biological
evolution. In a GA, individual solutions to a problem are
represented in an appropriate discrete structure (e.g., a
sequence of bits that can be set according to the presence
or absence of a specific characteristic), a population of
individuals is created, each with different characteristics
as defined by the representation, and genetic operations
(e.g., mutation and crossover) are applied over successive
generations of the population to produce new individuals.
Each individual is then evaluated according to its fitness
and, in a eugenic sense, only the most fit are allowed to
reproduce and create a new generation. Solutions that
evolve in this fashion support a wide variety of inductive approaches to solving complex problems (Holland
et al. 1986).
Though GAs have several characteristics that make
them different from other heuristic approaches, an
important distinction is that search is conducted from a
multiplicity of points. This approach is quite unlike simple
heuristics, which search from a single point and use
transition rules that govern movement through the search
space. Such approaches can easily become trapped in a
local (false) optimum. In contrast, when multiple searches
are conducted using a GA, it is much more likely that the
presence of local optima will not impede the discovery of
novel solutions. Because GAs search robustly, they can
be used in nonlinear, discontinuous, and multicriteria
rank 1
rank 2
rank 3
a
objective2
search spaces and can be applied to a wide variety of

difficult-to-solve problem contexts (Bennett, Wade, and
Armstrong 1999; Sait and Youssef 1999; Xiao, Bennett,
and Armstrong 2002).
Another advantage that GAs exhibit is that the
collection of searches usually yields a diverse set of
optimal and near-optimal solutions that can be evaluated
by humans. This is especially advantageous in multicriteria optimization, when decision makers may wish to
include criteria that were not included in the explicit
mathematical formulation of the problem. Such criteria
are often encountered in real problem-solving and may
involve, for example, unstated (and unmeasurable)
political, justice-related, or ethical considerations. In the
special case examined here, we expect that subjective
decisions will need to be made in order to satisfy
unarticulated design or aesthetic objectives. As Thrower
(1972, 1) eloquently suggests, Cartography, like architecture, has attributes of both a scientific and an artistic
pursuit, a dichotomy which is certainly not satisfactorily
reconciled in all presentations.
603
b
c
objective1
Figure 3. Pareto optimality for a two-objective problem. All points

enclosed in the band are nondominated when the objective is to
minimize both objective1 and objective2. Part a has three solutions that
are good for objective1, while solutions in part c are good for objective2.
A good solution-generation process, however, should provide results
in all parts of the solution space, including the area of tradeoffs
indicated by b.
Genetic Algorithms and Multiobjective Optimization

When multiobjective problems are addressed, using
either GAs or some other search method, two approaches
to goal search can be taken. In the first approach, the
objectives are somehow combined so that they can be
represented as a single objective function that is then
optimized (Cohon 1978). This is often accomplished
through the use of an additive set of weights that sum to
one. Carver (1991), Jankowski (1995), and Malczewski
(1999) provide several clear examples that describe how
this approach can be implemented in a GIS environment. The problem, of course, is the specification of an
appropriate set of weights, since different weights yield
different results. Weights, unfortunately, are normally
difficult to specify a priori.
This general problem of criteria combination has led to
the development of a different approach, which rejects the
idea that a single optimal solution to each multiobjective
problem exists. This approach suggests that there are
numerous Pareto optimal solutions, each of which is
nondominated by others on one of the multiple objectives.
To understand the concept of Pareto optimality and
nondominated solutions, let us consider k objectives
~
x; f2 ~
x; :::; fk ~
xT that are to be minimized,
f ~
x f1 ~
where a feasible solution~
x is a vector of decision variables.
x if and only if
A solution vector ~
x is said to dominate ~
8i f~
x r f~
x^9i f~
x o f~
x; i 2f1; . . . ; kg: In Figure 3,
where k 5 2, the solutions on the Pareto front dominate
those that are not on the front. However, for any two
solutions on the front, no one can be said to dominate the

other. Pareto (1971) suggested that an optimum allocation of resources in society is not attained so long as it is
possible to make at least one or more individuals better off
(in their estimation) while keeping others as well off as
before. Our search, therefore, becomes one of finding
solutions along (or close to) the Pareto front that is
defined by tradeoffs among the set of nondominated
solutions.
Genetic Algorithm Implementation

Problem-specific decisions about representation, fitness, selection, and crossover can affect the ability of a GA
to search efficiently and effectively for solutions on or close
to the Pareto front of the solution space of each problem. A
solution representation strategy is the first issue that must
be confronted. Although a canonical GA uses binary
strings to represent solutions to a problem (Holland 1986),
more recent research suggests that other representations
may be more effective (Michalewicz 1996, 110; Fogel and
Angeline 1997). In the implementation reported here
(MoChomultiobjective genetic algorithm/choropleth),
we used breakpoints in the data array to represent
class intervals (Figure 4). The search for optimum class
intervals can be viewed as a process of making adjustments
to the position of these class breakpoints along the number
line. Our implementation of this representation forces the
creation of discontinuous classes (e.g., 17, 913, 1625,
604
13
18
16
37
18
30
52
31
28
35
44
25
29
25
32
34
Original observations
52
44
37
35
34
32
31
30
29
28
25
18
16
13
Values (sorted,
nonrepeated)
10th
6th
Choropleth map
3rd
3
Positions of
break points
10
An array with three elements

holding position of break points
Figure 4. The encoding process used. The original observations are shown within the areal units on the left-hand side of the figure. These
observations are then listed in a sorted array, where three breakpoints are specified to indicate four classes. These breakpoints are used in the
genetic algorithm to form an individual and are also used to produce the choropleth map.
3150), which is an often-recommended option, since it

avoids spurious or empty parts of ranges (Dent 1999,
152). The number of combinations of these class breakpoints will, except for small problems, preclude an
expedient, brute-force enumeration of possibilities.
In GAs, operations such as selection, crossover, and
mutation guide the search for optimal solutions. The
selection process is designed to ensure that the best
solutions survive to the next generation; in this way, the
building blocks for optimal solutions can emerge and
converge (Goldberg 1989; Holland 1975). Additional
concerns, however, must be taken into consideration in a
multiobjective context. In such cases, the nondominated
solutions must be given a higher chance to survive and
thus ensure the emergence of the Pareto front. Paretooptimum selection approaches have been developed to
support this concept (see Goldberg 1989; Srinivas and
Deb 1995). In such approaches, each individual is
assigned a rank that indicates its nondomination in the
entire population (Figure 3). Specifically, all nondominated individuals in the entire population are assigned to
the highest rank, rank 5 1. Then, among the remaining unranked individuals, the nondominated ones are assigned
rank 5 2. This process continues until all individuals are
ranked. At the completion of the ranking process, those
individuals with the highest ranks lie closest to the Pareto
front and are therefore assigned a higher fitness value.
Individual solutions that are selected because of their
high fitness values are then manipulated by crossover
operations to create individuals for the next generation.
In our implementation, we designed two crossover
operations to increase crossover variability during the
search process (Figure 5); the program randomly chooses

one of the two operations as it executes. These methods
differ in whether new breakpoints are generated during
the crossover process: each child solution generated by
Method 1 only inherits information (breakpoints) that
already exists in the parents string, while in Method 2 new
breakpoints not present in the parents string can be
created. Though crossover operations are important to the
effective implementation of a GA, the search for Paretooptimal solutions is normally unsuccessful without introducing an additional approach to searching for new
solutions. This is typically achieved using mutation
operations. MoCho uses three kinds of mutation operations. The first randomly adjusts the position of a
breakpoint upward or downward on the gene, the second
inverts the sequence of the breakpoints in a solution,
and the third randomly reinitializes the breakpoints in
a solution.
In many cases, selection and crossover operations
work together effectively but generate an undesirable
outcome: they drive a GA to create a population
of fit individuals that are almost identical. This is
problematic because when a population consists of
similar individuals, the likelihood of finding novel solutions will decrease. Maintaining population diversity
is especially critical in a multiobjective context because
of the overarching need to search for tradeoff solutions.
While we may wish to determine good solutions for
individual objectives (i.e., in Figure 3, the best values for
objectives 1 and 2 are in the areas a and c, respectively),
the nondominated solutions lying between the best
values of the individual conflicting objectives are desirable
Algorithm
Variables
Crossover Algorithm
chromlen
cross_method
p1[ ]
p2[ ]
c[ ]
v[ ]
Functions
xflip0(p)
sort(v)
validate(c)
xnrandom(n1, n2)
xrand0( )
605
the length of chromosome which equals (number of classes 1)

an integer indicating which crossover method is to be used
parent 1, an array of long int of size chromlen
parent 2, an array of long int of size chromlen
child, an array of long int of size chromlen
an array of long int of size 2*chromlen
returns 1 if a random number between 0 and 1 is smaller than p,
returns 0 otherwise
sorts the array v in an ascending manner
ensures that chromosome c is valid
returns a random integer between n1 and n2
returns a random float number between 0 and 1
for (i=0; i<chromlen; i++) {

v[2*i] = p1[i];
v[2*i+1] = p2[i];
}
sort(v);
cross_method = xnrandom(0, 2);
switch (cross_method) {
case 0:
int xp = xnrandom(0, chromlen-1);
for (i=0; i<xp; i++)
c[i] = v[2*i];
for (i=xp; i<chromlen; i++)
c[i] = v[2*i+1];
break;
case 1:
float w = xrand0();
for (i=0; i<chromlen; i++)
c[i] = w*v[2*i] + (1-w)*v[2*i+1];
break;
}
validate(c);
/* ensure the ascending order of breaking points */

/* randomly select a crossover method
*/
/*** Method 1. modified one point crossover
/* get a random crossover point
*/
*/
/*** Method 2. weighted crossover

/* get a random weight between 0 and 1
*/
*/
Figure 5. A code fragment for crossover algorithms.
points from which to evolve additional solutions

(see b in Figure 3). This problem of premature convergence to a limited part of the search space (e.g.
toward the single-objective optima a and c in Figure 3)
can be avoided by maintaining a diverse population.
To address this diversity problem, we designed a
specialized island model (SIM) that is derived from the
island model of parallel GAs (see Martin, Lienig, and
-Paz and Goldberg 2000). In the
Cohoon 1997; Cantu
SIM, several subpopulations are maintained and developed in a partially isolated (virtual) island environment.
Each subpopulation is operated on by a complete set of GA
functions, and individuals from each subpopulation are
exchanged between islands through a mechanism called
migration. In addition, the task of an island is specialized in
the way that objectives are handled. Some subpopulations
are specialized to solve a single objective and some a subset
of objectives, while others are specialized for all objectives.
In this research, we designed an island model with
nine subpopulations; Table 2 shows the settings for each
subpopulation. Although use of the SIM approach

requires a greater amount of computer time (which could
be reduced by parallelism), it allows for a more complete
exploration of the solution space to be conducted. In
addition, because population diversity is the goal of the
SIM, the size of each subpopulation need not be as large as
would be required if this approach were not adopted,
which effectively reduces some of the increased computational overhead.
Results
We conducted two sets of experiments to evaluate the
accuracy, robustness, and efficacy of the MoCho approach
to class-interval selection. In the first set, we created six
small test datasets and used an exhaustive search strategy
(complete enumeration of all possible classifications) to
study the relations among the criteria discussed above.
MoCho was also applied to these small datasets to validate
606
Table 2. Specification of Objectives to Be Optimized and

Types of Interactions Allowed for Each Island Subpopulation
Subpopulation
0
1
2
3
4
5
6
7
8
Objectivesa
Migration destinationb
0111
1101
1011
1110
1111
0001
0010
0100
1000
4
4
4
4
1
1
1
1
1
For each dataset, MoCho needs the following

information:
an array of all observations, used to calculate GVF,
GEA, and BAI;

an array of nonrepeated observations, to which the
breakpoints are applied;

an array of areas for each enumeration unit; and
the topological neighborhood relation between
The objectives used for a subpopulation are indicated by a binary string, where
a 1 on i-th element means that the i-th objective is applied, and otherwise it is
0. For example, 0111 means this subpopulation is specialized to find solutions
with respect to the first, second, and third objectives.
b
A positive integer indicates the index of the destination subpopulation, and
1 means to randomly migrate to all other subpopulations.
its performance. In the second set of experiments, we used

MoCho to classify three real geographical datasets
(Table 3), including the relatively small dataset (gross
value per acre of farm products in Illinois during 1959)
used by Jenks and Caspall (1971), a moderate-sized,
multistate county dataset, and a large classification
problem (all counties in the conterminous USA).
enumeration units. This is straightforward for

grids. For a polygon dataset, an array of linked
lists is used in which each element in the array
is a linked list consisting of the IDs of the neighbors
of a polygon the ID of which is the index of
the array. This information is used to calculate
MIC, because clusters are configured from the
neighborhood relations of the original enumeration units.
To run MoCho, a set of initial solutions is randomly
generated and evaluated according to the four objectives
(equation [7]). The representations of these solutions are
then modified by the crossover and mutation operations.
The algorithm is executed iteratively, and at the end of
each iteration, the individuals that have lower rank values
are selected (recall that we are minimizing all objectives)
Table 3. Datasets and Their Genetic Algorithm Configurations

Number of Observations
Type
Name
TT1
TT2
Small TTL1
dataset
TTL2
F22
F24
Description
Random grid with a
symmetrical distribution
of the histogram
Random grid with a positively
skewed distribution of the
histogram
Linear trend surface with
a symmetrical distribution
of the histogram
Linear trend surface with
a positively skewed
distribution of the histogram
Fractal surface, fractal
dimension 5 2.2
Fractal surface, fractal
dimension 5 2.4
IL59
Illinois 1959 farm products
Large 5State90 1990 county population
dataset
density (mi 2) of Iowa,
Minnesota, North Dakota,
Nebraska, and South Dakota
USA90 1990 county population density (mi 2)
of conterminous USA
Total
Nonrepeated
Total
SubpopuNumber of
lation
Sampling
Classifications
Size
Generations
Rate
49
46
148,995
50
150
45%
49
43
123,410
40
150
43%
49
45
135,751
50
150
49%
49
47
163,185
40
150
33%
49
33
35,960
40
150
150%
49
36
52,360
40
150
103%
102
398
101
92
3,921,225
2,672,670
40
40
150
150
1.38%
2.02%
3111
534
3,325,048,545
66
90
0.0016%
and then used to create the next generation of solutions.
In this case, different families of solutions explore the
decision space. For example, some solutions evolve to
states in which the internal variability of each class is
minimized, while others equalize the area included in each
class, and still others arrive at a compromise between
extremes. In this way, the decision space is explored, and
the set of good solutions that defines the best tradeoffs
among solutions emerges. To facilitate comparison among
these solutions, data were placed into five classes and a
sequential shading scheme (Brewer 1994) was used to
produce maps in all the cases examined in this article.
Experiments with Small Test Datasets

To evaluate the effectiveness of MoCho in a controlled
set of experiments, we created six 7 x 7 grids (Table 3; also,
see the appendix for details) and applied MoCho to find
the nondominated classifications. These small datasets
607
were designed to represent a variety of challenges to

the GA classification procedure. In particular, we used
symmetrical and positively skewed statistical distributions
for both linear-trend and random surfaces, as well as
two fractal surfaces to test our algorithms. Table 3 lists
configurations of MoCho (size of subpopulation and
number of generations). The MoCho results were compared to the universe of all possible classifications that was
calculated using a brute-force enumeration. Table 3 also
contains a column labeled Sampling Rate: a small value
of this rate reflects an effective GA search. Since nine
subpopulations are used, we can calculate this rate as:
sampling rate
9subpopulation sizenumber of generations 8
:
total number of classifications
In Figures 611, each small-multiple graph (Tufte
1997) shows the tradeoff between pairs of the four criteria
Figure 6. The results for TT1. This figure consists of two main parts: the leftmost column and a scatterplot matrix with four rows and four
columns. The leftmost column contains four displays of the dataset. Each represents the best classification with respect to the objective marked in
the diagonal cells of the scatterplot matrix. The diagonal cells indicate the objectives used to form the two axes of the plots in the other cells. The
objective in a diagonal cell is the vertical axis of the cells in the same row and is the horizontal axis of the cells in the same column. For example, the
second row of the rightmost column is a plot the vertical axis of which is GEA and the horizontal axis of which is BE. This is a symmetric matrix, and
the difference between a panel in the upper-right triangle and its lower-bottom counterpart is that the positions of vertical and horizontal axes are
reversed. Light gray dots represent possible classifications; dark gray dots represent the nondominated solutions.
608
Figure 7. The results of TT2 (see caption of Figure 6 for description).
Figure 8. The results of TTL1 (see caption of Figure 6 for description).
Figure 9. The results of TTL2 (see caption of Figure 6 for description).
Figure 10. The results of F22 (see caption of Figure 6 for description).
609
610
Figure 11. The results of F24 (see caption of Figure 6 for description).
evaluated in this research. Each light gray dot represents

one instance of all possible class-interval selections plotted
in the 2-D criteria space of the problem; the dark gray dots
in these figures indicate the nondominated solutions
generated by MoCho, where the best trade-off solutions lie close to the origin of each graph. In some cases, the
visible striations in these figures are a consequence of
the discrete values used in the test datasets (see the
appendix), and in all cases, the shapes of the plots result from
the statistical distribution of the original data. Finally, the
left-hand side shows the optimal solution for the single
objective in that row.
Based on the results obtained from the test datasets,
it is apparent that most of the criteria we examined
showed clearly that tradeoffs could be made among them.
The results are slightly disquieting in only one case: the
trade-off between EVF and BE (and, equivalently, GVF/
TAI and BAI) is not clear. That is, unlike the other cases,
the number of nondominated solutions is relatively small.
For the TTL2 dataset, for example, there is only a single
nondominated solution, which means that EVF and BE do
not significantly conflict and sometimes do not conflict at
all. Consequently, we were able to find a single solution
that is optimal for both objectives. In all other cases,

however, it can be observed that there are clear tradeoffs
to be made. For all five of these remaining cases (i.e.,
EVF-GEA, EVF-MIC1, GEA-MIC1, GEA-BE, MIC1-BE),
it is also evident that MoCho consistently finds the
nondominated points since the dark gray dots lie close
to the origin of each figure and are well-spaced along
each axis.
These results support the validity of the GA approach
to finding Pareto-optimal solutions for multiobjective
choropleth classification problems. We therefore applied
MoCho to the three realisticand much largerdatasets
representative of those encountered in geographicalresearch and cartographic-production environments.
Experiments with Large Geographical Datasets

Table 3 provides a description of these larger datasets and their MoCho configurations. Table 4 provides an
overview of the performance of MoCho when it is applied
to the geographical datasets. For the small (IL59) and
medium (5State90) datasets, Figures 12 and 13, respectively, compare MoCho results with a brute-force
611
Figure 12. A scatterplot matrix for the results of IL59. There are four rows and four columns. The diagonal cells indicate the objectives that are
used to form the two axes of the plots in other cells. The objective in a diagonal cell is the vertical axis of the cells in the same row and is the
horizontal axis of the cells in the same column. For example, the second row of the rightmost column is a plot the vertical axis of which is GEA and
the horizontal axis of which is BE. This is a symmetric matrix, and the difference between a panel in the upper-right triangle and its lower-bottom
counterpart is that the positions of the vertical and horizontal axes are reversed. Light gray dots represent possible classifications; dark gray dots
represent the nondominated solutions.
enumeration of all solutions. For the USA90 dataset, the

size of the problem made it difficult to search the entire
solution space exhaustively. Consequently, we adopted
a Monte Carlo (MC) approach (see Conley 1984) to
generate 10,000 random solutions, which we then
compared with the MoCho results (Figure 14). While
we are unable to make the claim that all possible classes are
included in this set of 10,000, it is likely that the range
contains solutions that are both on and close to the Pareto
front. Thus, when the multicriteria decision space of the
USA90 dataset is considered, we can compare the Pareto
front found by MoCho with the random solutions.
We obtained the optimal solution for EVF of USA90
using Fishers algorithm (Fisher 1958). We also conducted
an MC evaluation for the IL59 and 5State90 datasets for
the purpose of comparison. In addition, to support
comparison with other traditional classification approaches available in GIS software, we include in Figures 1214
the results of an equal interval and quantile classification.
When compared with an exhaustive search (Figures 12

and 13), the solutions generated by MoCho (dark points)
are Pareto-optimal along most of the front (the lower-left
edge of the area formed by the light points), and, for the
remaining part, are very close to the front. When compared
with the Monte Carlo solutions, the results are clear:
MoCho consistently outperforms the MC approach and
finds the Pareto front that defines the tradeoff between the
criteria used in this research (Table 4 and Figure 14). As
shown in Table 4, when we consider only one objective, it is
clear that MoCho consistently found the best solutions for
all four objectives. In addition, MoCho found better
solutions than the optimal solutions found by the MC
approach. We also compared the best GVF value obtained
using MoCho with that reported by Jenks and Caspall
(1971) and found that MoCho found the optimal solution.
This lends strength to the use of MoCho in other contexts.
Unfortunately, we found it impossible to compare the best
BAI found by MoCho with that reported by Jenks and
612
Figure 13. A scatterplot matrix for the results of 5State90 (see caption of Figure 12 for description).
Caspall (1971), because we used a digital version of an

Illinois county map. If we declare that a boundary exists
when two polygons have at least two common sequential
points, in IL59, the total number of boundaries is 265. Jenks
and Caspall (1971), on the other hand, report the number

of boundaries as 258. It is difficult to determine how this
discrepancy arose, though one might speculate that a
counting error occurred.
Table 4. An Overview of the Results for Three Datasets

Optimalb
GA
MC
EVF
GEA
MIC1
BE
EVF
0.068662
0.006119
0.160681
0.043713
0.021963
0.068662
0.006119
0.160681
0.043713
0.021963
0.069585
0.036077
0.218831
0.064065
0.024425
5State90
GEA
MIC1
BE
EVF
0.031507
0.506293
0
0.040344
0.031507
0.506293
0
0.040344
0.087830
0.546108
0.027794
0.113400
USA90
GEA
MIC1
BE
0.013462
0.232920
0.007024
0.100915
0.471010
0.044339
IL59a
Note: In all cases considered, the GA found the best solution, obtained either by brute-force enumeration (IL59 and 5State90) or by Fishers algorithm (USA90).
a
For the IL59 dataset, the best TAI value in Jenks and Caspall (1971) is 0.73455, which has classes of (15.5741.20, 41.2158.50, 58.5175.51, 75.52100.10,
100.10155.30). In terms of EVF, this classification yields 1-GVF 5 0.705823. The intervals found by our GA are (15.5741.20, 41.2160.66, 60.6777.29, 77.30
100.10, 100.10155.30).
b
Fishers algorithm is used to obtain the optimal values for EVF (Fisher 1958). Many implementations of this algorithm are available (see Hartigan 1975; Lindberg 1990); we
used a Fortran program provided by Hartigan (1975, 13042). For the small (IL59) and medium (5State90) datasets, the results from Fishers algorithm are identical to the
results found by using brute force enumerations.
613
Figure 14. A scatterplot matrix for the results of USA90. There are four rows and four columns. The diagonal cells indicate the objectives that are
used to form the two axes of the plots in other cells. The objective in a diagonal cell is the vertical axis of the cells in the same row and is the
horizontal axis of the cells in the same column. For example, the second row of the rightmost column is a plot the vertical axis of which is GEA and
the horizontal axis of which is BE. This is a symmetric matrix, and the difference between a panel in the upper-right triangle and its lower-bottom
counterpart is that the positions of the vertical and horizontal axes are reversed. Light gray dots represent solutions generated using the Monte
Carlo approach; dark gray dots represent the nondominated solutions.
Table 5 gives the time used to run MoCho using the grid
and the polygon datasets used in this article. It shows that,
for the IL59 and 5State90 datasets, MoCho uses approxi-
mately 1 percent of the time required to conduct an

exhaustive search of these two polygon datasets. This
percentage is consistent with the sampling rates listed in
Table 5. Real Computing Time Used on a Pentium 4 1.4 GHz Computer with an SCSI Hard Drive
Dataset
TGA (seconds)
TES (seconds)
TMC (seconds)
TGA/TES
TT1
TT2
TTL1
TTL2
F22
F24
IL59
5State90
USA90
22.169
12.153
12.523
9.643
8.689
10.19
33.749
123.171
963.462
36.29
30.414
27.302
30.061
6.851
11.952
3280.203 (0.91 hr)
7236.225 (2.01 hr)
82988556.620* (23052.38 hr)
6.667
26.964
249.586
61.1%
40.0%
45.9%
32.1%
126.9%
85.3%
1.0%
1.7%
0.0012%a
TGA 5 time in seconds used to compute GA with the configuration listed in Table 3.
TES 5 time in seconds used to complete an exhaustive search.
TMC 5 time in seconds used to compute 10,000 Monte Carlo solutions.
a
Estimated value; see text for calculation.
614
Table 3. Though the exact time used to exhaustively

enumerate all classifications for USA90 is unknown, if
we assume that the computing time for every 10,000
classifications is approximately the same, it can be
estimated using the following equation:
TES TMC total number of classifications=10000
23052:38 hr 42yrs
9
The estimated TGA/TES rate is 0.0012 percent, which is
consistent with the GA sampling rate of the USA90
dataset (Table 3).
Figures 1517 illustrate the classifications for the
minimum values of the four objectives in each of
the three geographical datasets. Because of the uneven
(skewed) distribution of observations in these datasets,

especially LATO90 and USA90, the EVF and BE
objectives fail to serve as good (single) classification
criteria if a visual assessment of the results is pursued.
Considering the specific example of USA90 (Figure 17),
EVF gives the best classification from a statistical view, but
spatial information is difficult to discern from the mapped
result. GEA gives the best classification from the perspective of equal class area, but the spatial structure is not
clear (its MIC 5 1 MIC1 5 1 0.8732 5 0.1268 on the
resulting map). For MIC1, though the spatial structure is
clear, the differentiation of the five classes is still not clear.
The result for the best BE solutions is similar to that of
EVF. Consequently, a cartographer might select a classification between these extreme solutions (see Figure 18).
Best for GEA
Best for EVF
15.5741.20
41.2160.66
60.6777.29
77.30 100.10
100.11155.30
15.5738.22
38.2352.05
52.0659.89
59.9075.51
75.52 155.30
Best for MIC1
15.5732.28
32.2933.82
33.8357.22
57.2358.50
58.51155.30
Figure 15. The four best class intervals for IL59.
Best for BE
15.57 41.20
41.21 66.32
66.33 96.78
96.79 119.90
119.91155.30
Best for EVF
1 64
65 253
254 551
552 1687
1688 2953
Best for MIC1
615
Best for GEA
13
47
8 15
16 29
30 2953
Best for BE
1 16
1 232
17 337
233 327
338 1212
328 551
1213 1687
552 1687
1688 2953
1688 2953
Figure 16. The four best class intervals for 5State90.
Discussion and Conclusions

When producing choropleth maps, cartographers
normally generalize data into a small number of classes.
Setting Toblers (1973) argument aside, the maximum
recommended number of classes is always less than a
dozen, even if hue is used as a visual variable. The
selection of data values that delimit the boundaries
between these classes is a key cartographic design
decision, as even minor changes in these boundaries can
have significant impacts on the visual efficacy of the
resulting map.
In the thirty years since Jenks and Caspall (1971) first
suggested that choropleth class-interval selection is a
multicriteria problem, surprisingly little progress has been
made toward finding solutions that are based on criteria
other than statistical ones. Researchers have been
particularly reluctant to investigate the geographical
characteristics of their data during the class-interval

selection process. Though Monmonier (1972, 1973) made
important progress in the midseventies, his original
work was not advanced for decades. Recently, however,
Murray and Shyy (2000) have applied optimization
and data-mining approaches to the derivation of classes
that specifically consider the geographical characteristics
of a problem. While this work is important, the generalizability of these approaches is limited by assumptions
about the measurement scale of the data to be mapped,
the form of the solution space, and the suitability of the
results for choropleth mapping; classes, for example,
should not overlap ( Jenks and Coulson 1963). Despite
these problems, however, researchers have developed a
variety of indices that can be used to measure how well
alternative classification schemes meet different criteria.
In addition to those chosen for this implementation,
616
Best for EVF
0 1332
1333 6605
6606 18533
18534 38478
38479 65275
Best for GEA
0 3
4 11
12 29
30 71
72 65275
Figure 17. The four best class intervals for USA90.
Cromley (1996), for example, has used minimax error to

define boundary error. Other objectives, such as OAI
(equation [3]) and round numbers (Monmonier 1982;
Brewer 2001), could also be introduced into our multicriteria optimization framework.
Even more fundamental, however, has been an implicit

assumption that choropleth class-interval selection is a
deterministic process and, thus, amenable to algorithmic
solutions. Cartographers may rely on qualitative judgments derived from experience and design philosophy, as
617
Best for MIC1
0 165
166 3206
3207 9660
9661 30107
30108 65275
Best for BE
0 3431
3432 9187
9188 15323
15324 30107
30108 65275
Figure 17. (Continued).
well as statistical analyses of the data, to construct a

map that is suited to its intended use. Choropleth map
construction is, therefore, best viewed as a semistructured
problem and, like other cartographic generalization
problems (Buttenfield and McMaster 1991), it may defy
exact solution through a deterministic automated solution
process. Cartography, after all, has been defined as [t]he

art, science, and technology of making maps, together
with their study as scientific documents and works of
art (Meynen 1973,1). Robinson (1952), MacEachren
(1995), Keates (1996), and Thrower (1972), among
others, have expressed nearly identical sentiments. If it is
618
GEA
0.8
0 66
0.6
67 2957
2958 18533
18534 38478
38479 65275
0.4
0.2
0
0
0.2
0.4
0.6
0.8
EVF
0.2
0.4
0.6
0.8
EVF
GEA
0.8
03
4 68
0.6
0.4
69 2412
2413 18533
18534 65275
0.2
0
Figure 18. A set of four classifications with objective values in between the extreme objective values. Each choropleth map is associated with a
plot, where the dark point shows the position of the current classification in the trade-off between GEA and EVG.
accepted that cartography has a distinctly artistic component, this aesthetic and subjective element cannot be
directly incorporated into traditional multicriteria evaluation techniques. What is required in this case is a tool
that helps cartographers explore the class-interval solution space. This will enable them to integrate the science,
art, and technology of map-making into cartographic

products.
A prototype of such a tool, called ChoroWare, was
developed as part of this research (Xiao, Armstrong, and
Bennett 2002). The analytical engine of this tool follows
the original formulation by Jenks and Caspall (1971), as
619
GEA
0.8
0 13
0.6
14 37
38 105
106 18533
18534 65275
0.4
0.2
0
0.2
0.4
0.6
0.2
0.4
0.6
0.8
EVF
GEA
0.8
0 4
5 17
18 54
55 3328
3329 65275
0.6
0.4
0.2
0
0.8
EVF
Figure 18. (Continued).
updated by researchers such as Cromley (1996), Declerq

(1995), and Slocum (1999), and recasts the class-interval
problem into a multicriteria format. A GA is used to
search for promising alternative solutions, which are then
reported to the cartographer for their evaluation. Our
results show that GAs are able to generate a range
of nondominated choropleth classification solutions. To
facilitate the use of this approach, our exploratory

graphical tool is designed to help cartographers as they
examine the tradeoffs among alternative classification
objectives and thus gain a greater understanding of how
these tradeoffs affect the resulting map. The graphical
interface of this tool presents users with linked cartographic, tabular, and graphical views of alternative
620
Figure 19. The prototype of a visualization tool to help users select class intervals.
classification schemes (Figure 19). This graphical representation plots how well alternative schemes perform with
respect to selected criteria and focuses attention on
alternatives at or near the Pareto front of the objectives
included in the formulation of the problem. When a
particular classification is selected in the classificationsolution space, the tabular and cartographic views are
automatically updated to illustrate it.
Though our multicriteria choropleth classification tool
works well, adding additional features to it could produce a
powerful choropleth-mapping support system. Examples
of software extensions include tools that allow users to (1)
select multiple schemes in the tabular or graphical views to
produce small-multiples that facilitate subjective evaluation of multiple data views, (2) define new objectives and
thus find the most suitable classification scheme to
accomplish user-specific goals, (3) search for an optimal
number of data classes, (4) search for effective symbolization schemes, and (5) use immersive visual environments
to examine higher-dimension criteria spaces directly.
At this point, one might reasonably imagine that there
are drawbacks to the approach described in this article. We
have identified two, though we suggest that neither is a
particularly critical limitation. The first grows directly
from the conceptual complexity of the overall approach.
The multicriteria GA approach is far more complex than
an equal-interval classification, for example. Our rejoin-
der to that criticism is that while simple classification

methods may be appropriate in some contexts, we should
not remain ignorant about alternatives, especially those
that obviate problems associated with conventional
approaches (e.g., empty classes). The computational performance of MoCho remains a drawback to its application.
One approach to improving the performance of an islandmodel GA such as ours is to allocate the processing of
generations on each virtual island to a separate processor.
Since this is a relatively coarse-grained problem, it can be
accomplished using networked personal computers or the
increasingly accessible processing power of the computational grid (Foster and Kesselman 1999; Foster 2002), and
the overall reduction in computation time will be nearly a
linear function of the number of processors used. This
would place the process of multicriteria class-interval
selection into the near-real-time temporal domain, allowing cartographers to specify and define their objectives
interactively, search for solutions that meet these objectives, and select those that meet their goals. This will bring
choropleth-map production into a new era consistent with
progress in other areas of visualization. Multicriteria
choropleth classification will equip new generations of
cartographers with tools that will spring choropleth-map
production from its narrow range of arbitrary choices into a
realm in which users retain a more substantial control over
the map-production process.
621
Appendix
TT1
91
260
362
394
322
402
407
147
254
396
360
319
472
524
157
387
303
319
392
470
514
151
306
386
322
340
448
649
273
371
358
386
314
408
641
260
338
355
357
433
439
770
258
351
341
331
452
464
786
TT2
511
103
64
180
194
14
161
397
84
280
90
17
43
60
70
673
106
92
168
319
129
100
28
70
500
120
133
290
449
438
86
152
187
122
203
14
130
187
36
10
41
500
195
100
295
183
79
726
10
TTL1
91
260
362
394
322
402
407
147
254
396
360
319
472
524
157
387
303
319
392
470
514
151
306
386
322
340
448
649
273
371
358
386
314
408
641
260
338
355
357
433
439
770
258
351
341
331
452
464
786
TTL2
90
32
2
2
187
203
465
97
6
30
67
165
271
470
100
91
31
19
116
275
544
53
65
31
199
180
260
605
66
78
20
147
142
303
686
88
14
80
126
150
375
773
94
61
37
151
293
467
741
F22
0.47
0.50
0.65
0.60
0.52
0.51
0.51
0.58
0.56
0.66
0.69
0.50
0.47
0.40
0.77
0.71
0.88
0.48
0.47
0.43
0.34
0.59
0.60
0.59
0.61
0.45
0.44
0.25
0.59
0.54
0.64
0.55
0.63
0.32
0.32
0.51
0.50
0.53
0.51
0.32
0.27
0.20
0.50
0.47
0.48
0.27
0.26
0.29
0.22
Acknowledgments
We wish to thank Ronghai Sa, R. Rajagopal, and the
reviewers for their comments on previous drafts of this
article.
References
Andrienko, G. L., and N. V. Andrienko. 1999. Interactive maps
for visual data exploration. International Journal of Geographical Information Science 13 (4): 35574.
Anselin, L. 1995. Local indicators of spatial associationLISA.
Geographical Analysis 27 (2): 93115.
FFF. 1999. Intractive techniques and exploratory spatial data
analysis. In Geographical information systems, vol. 1, Principles
and technical issues, ed. P. A. Longley, M. F. Goodchild, D. J.
Maquire, and D. W. Rhind, 25366. New York: John Wiley
and Sons.
Armstrong, M. P., G. Rushton, R. Honey, B. T. Dalziel, P. Lolonis,
S. De, and P. J. Densham. 1991. Decision support
for regionalization: A spatial decision support system for
regionalizing service delivery systems. Computers, Environment and Urban Systems 15 (1): 3753.
Armstrong, R.W. 1969. Standardized class intervals and rate
computation in statistical maps of mortality. Annals of the
Association of American Geographers 59:38290.
Bailey, T. C., and A. C. Gatrell. 1995. Interactive spatial data
analysis. Harlow, Essex: Longman.
F24
0.13
0.18
0.28
0.29
0.14
0.24
0.31
0.18
0.30
0.35
0.25
0.19
0.40
0.53
0.16
0.52
0.44
0.32
0.33
0.41
0.51
0.37
0.31
0.25
0.34
0.31
0.34
0.36
0.69
0.24
0.12
0.44
0.55
0.40
0.23
0.66
0.51
0.70
0.50
0.52
0.42
0.58
0.48
0.96
0.88
0.53
0.44
0.30
0.17
Bennett, D. A., G. A. Wade, and M. P. Armstrong. 1999. Exploring

the solution space of semistructured geographical problems using genetic algorithms. Transactions in GIS 3 (1):
5171.
Bowker, G. C., and S. L. Star. 1999. Sorting things out: Classification
and its consequences. Cambridge, MA: MIT Press.
Brewer, C. A. 1994. Color use guidelines for mapping and
visualization. In Visualization in modern cartography, ed. A. M.
MacEachren and D. R. F. Taylor, 12347. Tarrytown, NY:
Elsevier.
FFF. 2001. Reflections on mapping Census 2000. Cartography
and Geographic Information Science 28 (4): 21335.
Buisseret, D., ed. 1992. Monarchs, ministers, and maps: The
emergence of cartography as a tool of government in early modern
Europe. Chicago: University of Chicago Press.
Buttenfield, B. P., and R. B. McMaster eds. 1991. Map generalization: Making rules for knowledge representation. New York:
John Wiley and Sons.
-Paz, E., and D. E. Goldberg. 2000. Efficient parallel geneCantu
tic algorithms: Theory and practice. Computer Methods in
Applied Mechanics and Engineering 186:22138.
Carver, S. J. 1991. Integrating multicriteria evaluation with
geographical information systems. International Journal of
Geographical Information Systems 5 (3): 32139.
Chang, K. 1978. Visual aspects of class intervals in choroplethic
mapping. The Cartographic Journal 15 (1): 4248.
Cohon, J. L. 1978. Multiobjective programming and planning.
London: Academic Press.
Conley, W. 1984. Computer optimization techniques. New York:
Petrocelli Books.
622
Coulson, M. R. C. 1987. In the matter of class intervals for

choropleth maps: With particular reference to the work of
George F. Jenks. Cartographica 24 (2): 1639.
Cromley, R. G. 1996. A comparison of optimal classification
strategies for choroplethic displays of spatially aggregated
data. International Journal of Geographical Information Systems
10 (4): 40424.
Cromley, R. G., and R. D. Mrozinski. 1999. The classification
of ordinal data for choropleth mapping. The Cartographic
Journal 36 (2): 1019.
Cuff, D. J., and M. T. Mattson. 1982. Thematic maps: Their design
and production. New York: Methuen.
Declerq, F. A. N. 1995. Choropleth map accuracy and the number
of class intervals. Proceedings of the 17th Conference and the 10th
General Assembly of the International Cartographic Association,
vol. 1, 91822. Barcelona: Institute Cartogra`fic de Catalunya.
Dent, B. D. 1999. Cartography: Thematic map design. 5th ed.
Boston: McGraw-Hill.
Dobson, M.W. 1973. Choropleth maps without class intervals? A
comment. Geographical Analysis 5 (4): 35860.
FFF. 1980. Commentary: Perception of continuously shaded
maps. Annals of the Association of American Geographers
70 (1): 1067.
Egbert, S. L., and T. A. Slocum. 1992. EXPLOREMAP: An
exploration system for choropleth maps. Annals of the
Association of American Geographers 82 (2): 27588.
Evans, I. S. 1977. The selection of class intervals. Transactions of
the Institute of British Geographers 2 (1): 98124.
Fisher, W. D. 1958. On grouping for maximum homogeneity.
American Statistical Association Journal 53:78998.
Fogel, D. B., and P. J. Angeline. 1997. Guidelines for a suitable
encoding. In Handbook of evolutionary computation, ed. T.
Back, D. B. Fogel, and Z. Michalewicz. C1.7, 12. New York:
Oxford University Press.
Foster, I. 2002. The grid: A new infrastructure for 21st-century
science. Physics Today February:4247.
Foster, I., and C. Kesselman, eds. 1999. The grid: Blueprint for a new
computing infrastructure. San Francisco: Morgan Kaufmann
Publishers.
Getis, A., and J. K. Ord. 1992. The analysis of spatial association
by the use of distance statistics. Geographical Analysis
24:189206.
Gilmartin, P., and E. Shelton. 1989. Choropleth maps on highresolution CRTs: The effects of number of classes and hue on
communication. Cartographica 26 (2): 4052.
Goldberg, D. E. 1989. Genetic algorithms in search, optimization, and
machine learning. Reading, MA: Addison-Wesley.
Goodchild, M. F. 1988. Spatial autocorrelation, Concepts and
Techniques in Modern Geography (CATMOG) no. 47.
Norwich, UK: Geobooks.
Griffith, D. A. 2000. A linear regression solution to the spatial
autocorrelation problem. Journal of Geographical Systems
2:14156.
Hartigan, J. A. 1975. Clustering algorithms. New York: John Wiley
and Sons.
Holland, J. H. 1975. Adaptation in natural and artificial systems.
Ann Arbor: University of Michigan Press.
FFF. 1986. Escaping brittleness: The possibilities of generalpurpose machine learning algorithms applied to parallel rulebased systems. In Machine learning: An artificial intelligence
approach, vol. 2, ed. R. S. Michalski, J. G. Carbonell, and
T. M. Mitchell, 593623. Los Altos, CA: Morgan Kaufmann.
FFF. 1998. Emergence: From chaos to order. Reading, MA:

Addison-Wesley.
Holland, J. H., K. J. Holyoak, R. E. Nisbett, and P. R. Thagard.
1986. Induction: Processes of inference, learning, and discovery.
Cambridge, MA: MIT Press.
Jankowski, P. 1995. Integrating geographical information
systems and multiple criteria decision-making methods.
International Journal of Geographical Information Systems 9 (3):
25173.
Jenks, G. F. 1977. Optimal data classification for choropleth maps.
Occasional Paper no. 2. Lawrence: Department of Geography, University of Kansas.
Jenks, G. F., and F. C. Caspall. 1971. Error on choroplethic maps:
Definition, measurement, reduction. Annals of the Association of American Geographers 61 (2): 21744.
Jenks, G. F., and M. R. C. Coulson. 1963. Class intervals for
statistical maps. International Yearbook of Cartography 3:119
34.
Kain, R. J. P., and E. Baigent. 1992. The cadastral map in the service
of the state. Chicago: University of Chicago Press.
Keates, J. S. 1996. Understanding maps. 2nd ed. Harlow, UK:
Addison Wesley Longman Ltd.
Kraak, M.-J. 1998. The cartographic visualization process: From
presentation to exploration. The Cartographic Journal 35 (1):
1115.
Kraak, M.-J., and A. Brown. 2001. Web cartography: Developments
and prospects. New York: Taylor and Francis.
Kraak, M.-J., and F. J. Ormeling. 1996. Cartography: Visualization
of spatial data. Essex, UK: Longman.
Lindberg, M. B. 1990. FISHER: A Turbo Pascal unit for optimal
partitions. Computers and Geosciences 16 (5): 71732.
Lolonis, P., and M. P. Armstrong. 1993. Location-allocation
models as decision aids in delineating administrative regions.
Computers, Environment, and Urban Systems 17 (2): 15374.
MacDougall, E. B. 1992. Exploratory analysis, dynamic statistical
visualization, and geographic information systems. Cartography and Geographic Information Systems 19 (4): 23746.
MacEachren, A. M. 1982. Map complexity: Comparison and
measurement. The American Cartographer 9 (1): 3146.
FFF. 1995. How maps work: Representation, visualization,
design. New York: Guilford Press.
MacEachren, A. M., and M. -J. Kraak. 2001. Research challenges
in geovisualization. Cartography and Geographic Information
Science 28 (1): 312.
Mak, K., and M. R. C. Coulson. 1991. Map user response to
computer-generated choropleth maps: Comparative experiments in classification and symbolization. Cartography and
Geographic Information Systems 18 (2): 10924.
Malczewski, J. 1999. GIS and multicriteria decision analysis. New
York: John Wiley and Sons.
Martin, W. N., J. Lienig, and J. P. Cohoon. 1997. Island (migration)
models: Evolutionary algorithms based on punctuated
equilibria. In Handbook of evolutionary computation, ed. T.
Back, D. B. Fogel, and Z. Michalewicz. C6.3: 116. New York:
Oxford University Press.
Mersey, J. E. 1990. Colour and thematic map design: The role of
colour scheme and map complexity in choropleth map communication. Cartographica Monograph no. 41. Toronto: University of Toronto Press.
Meynen, E, ed. 1973. Multilingual dictionary of technical terms in
cartography. Weisbaden: Franz Steiner Verlag.
Michalewicz, Z. 1996. Genetic algorithms 1 data structures 5 evolution programs. 3rd ed. Berlin: Springer-Verlag.
Monmonier, M. S. 1972. Contiguity-biased class-interval selection: A method for simplifying patterns on statistical maps.
Geographical Review 62:20328.
FFF. 1973. Analogs between class-interval selection and
location-allocation models. The Canadian Cartographer
10:12331.
FFF. 1982. Flat laxity, optimization, and rounding in the
selection of class intervals. Cartographica 19 (1): 1627.
FFF. 1985. Technological transition in cartography. Madison:
University of Wisconsin Press.
FFF. 1992. Authoring graphic scripts: Experiences and
principles. Cartography and Geographic Information Systems
19 (4): 24760.
FFF. 1996. How to lie with maps. 2nd ed. Chicago: University of
Chicago Press.
Morrison, J. L. 1993. Cartography and the spatially literate
populace of the 21st century. Cartography and Geographic
Information Systems 20 (4): 2049.
Muller, J.-C. 1979. Perception of continuously shaded maps.
Annals of the Association of American Geographers 69 (2):
24049.
Murray, A. T., and T. -K. Shyy. 2000. Integrating attribute and
space characteristics in choropleth display and spatial data
mining. International Journal of Geographical Information
Science 14 (7): 64967.
OSullivan, D., and D. Unwin. 2003. Geographic information
analysis. Hoboken, NJ: John Wiley and Sons.
Odland, J. 1988. Spatial autocorrelation. Newbury Park, CA: Sage.
Olson, J. 1975. Autocorrelation and visual map complexity.
Annals of the Association of American Geographers 65 (2):
189204.
Pareto, V. 1971. Manual of political economy. Trans. A. S. Schwier
from French edition first published in 1896. New York:
Augustus M. Kelley.
Peterson, M. P. 1979. An evaluation of unclassed crossed-line
choropleth mapping. The American Cartographer 6 (1): 2137.
Peucker, T. K., and N. R. Chrisman. 1975. Cartographic data
structures. The American Cartographer 2:5569.
Raisz, E. 1948. General cartography. New York: McGraw-Hill.
Robinson, A. H. 1952. The look of maps: An examination of
cartographic design. Madison: University of Wisconsin Press.
623
Robinson, A. H., R. D. Sale, J. L. Morrison, and P. C. Muehrcke.

1984. Elements of cartography. 5th ed. New York: John Wiley
and Sons.
Robinson, A. H., J. L. Morrison, P. C. Muehrcke, A. J. Kimerling,
and S. C. Guptill. 1995. Elements of cartography. 6th ed. New
York: John Wiley and Sons.
Sait, S. M., and H. Youssef. 1999. Iterative computer algorithms with
applications in engineering: Solving combinatorial Optimization
problems. Los Alamitos, CA: IEEE Press.
Slocum, T. A. 1999. Thematic cartography and visualization. Upper
Saddle River, NJ: Prentice Hall.
Smith, D. M. 1977. Human geography: A welfare approach. New
York: St Martins Press.
Smith, R. M. 1986. Comparing traditional methods for selecting
class intervals on choropleth maps. The Professional Geographer 38 (1): 6267.
Srinivas, N., and K. Deb. 1995. Multiobjective optimization using
nondominated sorting in genetic algorithms. Evolutionary
Computation 2 (3): 22148.
Thrower, N. J. W. 1972. Maps and man: An examination of
cartography in relation to culture and civilization. Englewood
Cliffs, NJ: Prentice Hall.
Tobler, W. 1973. Choropleth maps without class intervals.
Geographical Analysis 5 (3): 26265.
Tufte, E. R. 1997. Visual explanations: Images and quantities,
evidence and narrative. Cheshire, CT: Graphics Press.
Tyner, J. 1992. Introduction to thematic cartography. Englewood
Cliffs, NJ: Prentice Hall.
U.S. Bureau of the Census. American FactFinder. http://factfinder.
census.gov/servlet/BasicFactsServlet (last accessed 1 May
2003).
Xiao, N., M. P. Armstrong, and D. A. Bennett. 2002. ChoroWare:
A software toolkit for choropleth map classification. In New
tools for spatial data analysis: Proceedings of CSISS Specialist
Meeting, ed. L. Anselin and S. J. Rey, CD-ROM. Santa
Barbara, CA: The Center for Spatially Integrated Social
Science.
Xiao, N., D. A. Bennett, and M. P. Armstrong. 2002. Using
evolutionary algorithms to generate alternatives for multiobjective site search problems. Environment and Planning A
34 (4): 63956.
Correspondence: Department of Geography and Program in Applied Mathematical and Computational Sciences, The University of Iowa,
Iowa City, IA 52242, e-mail: marc-armstrong@uiowa.edu (Armstrong); Department of Geography, The University of Iowa, Iowa City,
IA 52242, e-mail: ningchuan-xiao@uiowa.edu (Xiao); Department of Geography, The University of Iowa, Iowa City, IA 52242, e-mail:
david-bennett@uiowa.edu (Bennett).

Jenks - Algoritimos Geneticos PDF

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Jenks - Algoritimos Geneticos PDF

Enviado por

Direitos autorais:

Formatos disponíveis

Using Genetic Algorithms to Create Multicriteria

Class Intervals for Choropleth Maps

ven a casual observer would recognize that a

Annals of the Association of American Geographers, 93(3), 2003, pp. 595623

Armstrong, Xiao, and Bennett

Figure 1. Three dimensions of map use. Source: After MacEachren

map. Instead, when media allow it, the cartographer is

Background and Problem Elaboration

mean (Jenks 1977) and median (Lindberg 1990; Slocum

Reintroducing Choropleth Class

Armstrong, Xiao, and Bennett

procedures seem to have (if only coincidentally) taken

statistical assumptions cannot be legitimately applied.

Choropleth Class Interval Selection

H 5 a set consisting of all chains where, for each

a = (1,2), (1,3), (2,3), (3,4), (3,5), or (4,5)

xi = x1 = 0.625, x2 = 2.15, x3 = 0.05

z i = (see the above table)

Armstrong, Xiao, and Bennett

Jenks and Caspall (1971) suggest that there are three

This general approach is often described as Jenks

Though, as shown in Table 1, the TAI approach is widely

In effect, the OAI is an area-weighted TAI. If all zones are

One approach that Jenks and Caspall suggest as

The numerator is the sum of the absolute differences

quantile; see Robinson et al. 1984, 354) have been

Armstrong, Xiao, and Bennett

where EVF is error of variance fit, GEA is geographical

sions that will not yield a correct result (e.g., branch

search spaces and can be applied to a wide variety of

Figure 3. Pareto optimality for a two-objective problem. All points

Genetic Algorithms and Multiobjective Optimization

solutions on the front, no one can be said to dominate the

Genetic Algorithm Implementation

Armstrong, Xiao, and Bennett

An array with three elements

3150), which is an often-recommended option, since it

search process (Figure 5); the program randomly chooses

the length of chromosome which equals (number of classes 1)

for (i=0; i<chromlen; i++) {

/* ensure the ascending order of breaking points */

/*** Method 2. weighted crossover

Figure 5. A code fragment for crossover algorithms.

points from which to evolve additional solutions

subpopulation. Although use of the SIM approach

Armstrong, Xiao, and Bennett

Table 2. Specification of Objectives to Be Optimized and

For each dataset, MoCho needs the following

GEA, and BAI;

breakpoints are applied;

its performance. In the second set of experiments, we used

enumeration units. This is straightforward for

Table 3. Datasets and Their Genetic Algorithm Configurations

Experiments with Small Test Datasets

were designed to represent a variety of challenges to

Armstrong, Xiao, and Bennett

Figure 7. The results of TT2 (see caption of Figure 6 for description).

Figure 8. The results of TTL1 (see caption of Figure 6 for description).

Figure 9. The results of TTL2 (see caption of Figure 6 for description).

Armstrong, Xiao, and Bennett

evaluated in this research. Each light gray dot represents

that is optimal for both objectives. In all other cases,