Você está na página 1de 13

Evolving Neural Network Using Real Coded Genetic Algorithm(GA) for

Multispectral Image Classification

Zhengjun Liu Changyao Wang Aixia Liu Zheng Niu

LARSIS, the Institute of Remote Sensing Applications, Chinese Academy of Sciences, Beijing 100101, China

Abstract: This paper investigates the effectiveness of the genetic algorithm evolved neural network classifier and
its application to the land cover classification of remotely sensed multispectral imagery. First, the key issues of the
algorithm and the general procedures are described in detail. Our methodology adopts a real coded GA strategy
and hybrid with a back propagation (BP) algorithm. The genetic operators are carefully designed to optimize the
neural network, avoiding premature convergence and permutation problems. Second, a SPOT-4 XS imagery is
employed to evaluate its accuracy. Traditional classification algorithms, such as maximum likelihood classifier,
back propagation neural network classifier, are also involved for a comparison purpose. Based on an evaluation of
the user’s accuracy and kappa statistic of different classifiers, the superiority of applying the discussed genetic
algorithm-based classifier for simple land cover classification using multi-spectral imagery data is established.
Thirdly, a more complicate experiment on CBERS (China-Brazil Earth Resources Satellite) data and discussion
also demonstrates that carefully designed genetic algorithm-based neural network outperforms than gradient
descent-based neural network. This has been supported by the analysis of the changes of connection weights and
biases of the neural network. Finally, some concluding remarks and suggestions are also presented.
Keywords: Genetic Algorithm Land Cover Classification Neural Network Remote Sensing

1. INTRODUCTION

So far, several pattern recognition algorithms have been adopted in remote sensing land cover
classification (Townshend et al., 1991), including some newly developed supervised classification
method, for example, the Fuzzy ARTMAP Classifier (Carpenter, et al., 1997, 1999; Gopal, et al.,
1999), and the Genetic Classifier (Bandyopadhyay and Pal, 2001; Pal, et al., 2001). Within these
methods, the neural network classifier and some other intelligent methods have been recognized to
be the most promising algorithms.
The classification of remotely sensed datasets using artificial neural networks first appeared in
remote sensing literature about ten years ago (Benediktsson et al., 1990). Since then, examples
and applications at different scales and with different data sources have become increasingly
common. In nearly all cases, the neural network classifier has proved its superiority to traditional
classifiers, usually with 10-20% overall accuracy improvements.
The most widely used neural network model is the multi-layer percepton (MLP), in which the
connection weight training is normally completed by a back propagation learning algorithm
(Rumelhart et al., 1986). The idea of weight training in MLPs is usually formulated as
minimization of an error function, such as mean square error (MSE) between target and actual
outputs averaged over all examples, by iteratively adjusting connection weights. One of the
essential characteristics of the back propagation algorithm is gradient descent, which has been
discussed in many textbooks (Nilsson, 1998) and software manuals (MathWorks Inc., 1997).
Despite its popularity as an optimization tool for neural network training, the gradient descent
technique also has several drawbacks. For instance, the performance of the network learning is
strictly dependent on the shape of the error surface, values of the initial connection weights, and
some further sophisticate parameters. A common error surface may have many local minima,
multimodal and/or nondifferentiable, which is usually not very easy to meet the desired
convergence criterion. This typically makes the gradient descent-based algorithm stuck in some
local minimum when moving across the error surface. Another shortcoming is the efficiency of
differential operation. Multilayer networks typically use sigmoid transfer functions in the hidden
layers. These functions are often called "squashing" functions, since they compress an infinite
input range into a finite output range. Sigmoid functions are characterized by the fact that their
slope must approach zero as the input gets large. This may result in a problem when using steepest
descent to train a multilayer network with sigmoid functions, since the magnitude of gradient may
change more and more smaller when the input becomes larger, therefore making tiny changes in
the weights and biases, even though the weights and biases are far from their optimal values.
Of course there are some approaches to prevent the gradient descent algorithm from becoming
stuck in any local minimum when moving across the error surface (Hertz et al., 1991). However,
all these are not able to really overcome the many existing problems(Hush and Horne, 1993; Hertz
et al., 1991).
On the other side, Genetic Algorithms (Goldberg, 1989) offer an efficient search method for a
complex problem space and can be used as powerful optimization tools. With regard to the
above-mentioned problems of the gradient descent a complete substitution of them by a GA might
be advantageous.
Recently some investigations into neural network training using Genetic Algorithms have been
published (Yao, 1999). Often only selective problems with their particular solution are in the focus
of attention. There is still few article concerning evolutionary neural network classifier for remote
sensing land cover classification that has been published in the literature.
With GAs, we can formulate the neural network training process as the evolution of connection
weights in the environment determined by the architecture and the learning task. Potential single
individual solutions (which are chromosomes in terms of GAs) to a problem compete with each
other through selection, crossover, and mutation operations in order to achieve increasingly better
results. With this strategy, GA can then be used effectively in the evolution to find a near-optimal
set of connection weights globally without computing gradient information.
During the weight training and adjusting process, the fitness functions of an neural network can
be defined by considering two important factors: the error between target and actual outputs and
complexity of the neural network. Unlike the case in gradient-descent-based training algorithms,
the fitness (or error) function does not have to be differentiable or even continuous since GAs do
not depend on gradient information. Therefore, GAs can handle large, complex, nondifferentiable
and multimodal spaces, which are the typical cases in remote sensing classification and many
other real world applications.
This paper demonstrates a method that uses GA to train the neural network for land cover
classification. The outline of this paper is as follows. First, in Section 2, we introduce the
multi-layer feed-forward neural network model, the genetic algorithm, and methodology to hybrid
real coded GA with a back propagation algorithm for neural network training. Next, in Section 3,
two experiments, including a simple land cover classification and comparison with SPOT-4 XS
data using our hybrid evolutionary neural network classifier and other classifiers, and a more
complicated experiment on CBERS data and its analysis are presented. Finally in Section 4, some
conclusions are reached and future works are also proposed.

2. METHODOLOGY

2.1. The Neural Network


Assuming we have a three-layer feed-forward neural network with m inputs (channels) and k
outputs (categories), and l hidden nodes. Each neuron in the hidden layer uses sigmoid function f(x)
as its threshold function, and each neuron in the output layers uses Purelin function p(x) as its
threshold function. The neuron output of hidden node h (1 ≤ h ≤ l) and output node q (1 ≤ q ≤ k)
can be expressed as:

⎛ m ⎞
zh = f (W T X ) = f ⎜ ∑ ωi xi − δ h ⎟ (1)
⎝ i =1 ⎠

⎛ l ⎞
oq = p (V T Z ) = p ⎜ ∑ vi zi − δ q ⎟ (2)
⎝ i =1 ⎠

respectively, where the superscript T stands for a vector transpose, W = [ω1,ω2, …,ωi, …, ωm]
is the weight connection vector between the input nodes and hidden node h, V = [v1, v2, …, vi, …,
vk] is the weight connection vector between the hidden nodes and output node q, X= [x1, x2, …,
xi, …, xm] is the input vector for each hidden node, and Z= [z1, z2, …, zi, …, zm] is the output
vector of the hidden nodes. δh and δq are the corresponding biases for hidden node h and output
node q. zh and oq are the output neuron responses for node h and node q, respectively. The Sigmoid
function f(x) is defined as:
1
f ( x) = (3)
1 + e−x
where x∈[-∞, +∞].And the Purelin function p(x) is defined as:

p ( x) = α x + β (4)

where α is a non-zero constant, β is the bias; and α, β ∈[-∞, +∞].


Assuming we have a set of pattern samples X={X1, X2, …, Xn}, where n is the number of
samples, and each sample Xi in set X is a m-dimensional feature vector; let T={T1, T2, …, Ti, …, Tn}
as set X’s corresponding output classes, Ti = [t1, t2, …, tj, …, tk] is a k-dimensional class vector. If
the target class for a specific sample is j (1 ≤ j ≤ k), then we have t j =1, otherwise t j = 0. For
simplicity, let’s denote oij as the jth actual neuron output for the input training sample Xi at the
output layer, while tij as its desired response. The mean square error function for this neural
network could be described as:

1 n k
ε (net ) = ∑∑
n ⋅ k i =1 j =1
(tij − oij )2 (5)

where ε is mean square error, net is the neural network.


2.2. Combination of Neural Network and Genetic Algorithm
The combination of genetic algorithm and neural network for weight training consists of three
major phases. The first phase is to decide the representation of connection weights, i.e., whether
we use a binary strings form or directly use a real number form to represent the connection
weights; The second step is the evaluation on the fitness of these connection weights by
constructing the corresponding neural network through decoding each genome and computing its
fitness function and mean square error function; The third one is applying the evolutionary process
such as selection, crossover, and mutation operations by a genetic algorithm according to its
fitness. The evolution stops when the fitness is greater than a predefined value (i.e., the training
error is smaller than a certain value) or the population has converged.
The technical design of the evolutionary strategy of connection weights training can be
described as:
1) Decode each individual (genotype) in the current generation into a set of connection
weights. Since this paper uses a straightforward real coded genotype representation, what
we have to do is just to set each neuron’s connection weights and bias to its
correspondent gene segments. The choice of real coded genotype representation can
search the potential solutions more precisely in feature space than binary representation.
Moreover, it is simple and intuitive.
2) Evaluate each set of the connection weights by constructing the corresponding neural
network structure and computing its total mean square error between actual and target
outputs, see Eq.s (1), (2), (3), (4), and (5). The fitness of an individual is determined by
the total MSE. The higher the error, the lower the fitness. Our fitness function δ is
defined as:
ε (neti ) − min(ε (neti ))
ε *(neti ) = (6)
max(ε (neti )) − min(ε ( neti ))

δ (neti ) = e−ψ ⋅ε *( net )


i
(7)

Here, Eq.(6) is a MSE normalization operation applied on each MLP represented by a


chromosome. Eq.(7) is the actual fitness function. In Eq.(7), ψ is a positive constant. A fine
choice from our experience is to set ψ to 6.0.
3) Select parents for reproduction based on their fitness. A roulette wheel selection scheme
is adopted in our experiments(Holland, 1975; Goldberg, 1989). The population of current
generation is mapped onto a roulette wheel, where each chromosome is represented by a
space that proportionally corresponds to its fitness.
4) Apply search operators in conjunction with the crossover and/or mutation operators, to
parent chromosomes to generate offspring, which form the next generations. An asexual
reproduction operator, an arithmetical crossover operator (Michalewicz, 1992), a single
point random mutation operator and a non-uniform mutation operator (Michalewicz,
1992) is applied in the experiments of this article.
Crossover operator: For the asexual reproduction operator, best 10% of chromosomes in
current generation are directly copied to the next generations as offspring. For the arithmetical
crossover operator, Let us assume that C1 = (c11 , L , ci1 , L , c 1n ) and C2 = (c12 ,L, ci2 ,L, cn2 )
are two chromosomes that have been selected to apply the crossover operator, then two
offspring, Hk, k=1,2, are created according to the following equations:
H k = (h1k ,L, hik ,L, hnk ) k = 1,2
hi1 = λci1 + (1 − λ )ci2 (8)
h = λc + (1 − λ )c
i
2 2
i
1
i

Where λ is a user specified positive constant. In our experiments, λ is set to 0.28.

Mutation operator: Let us suppose C = (c1 ,L, ci ,L, c n ) is a parent chromosome,


ci ∈ [ai , bi ] is a gene to be mutated, and ai and bi are the lower and upper ranges for gene ci. A
new gene in the offspring chromosomes, c i' may arise from the application of two different
mutation operators respectively.
The first mutation operator is single point random mutation, in which a single gene ci' is
randomly chosen number from range [ a i , bi ] to replace ci and to form new chromosome C ' .
This mutation operator is sometimes called uniform mutation.
Another mutation operator is the non-uniform mutation operator. Assuming this operator is
applied in a generation t, and gmax is the maximum number of generations, we can describe it as:

⎧ci + ∆(t , bi − ci ) if τ = 0
ci' = ⎨
⎩ci − ∆(t , ci − a i ) if τ = 1 (9)
t b
(1− )
∆(t , y ) = y (1 − r g max
)
where τ is a random binary number having value 0 or 1, and b is a parameter chosen by the user,
which determines the degree of dependency on the number of iterations. This function gives a
value in the range [0, y] such that the probability of returning a number close to zero increases as
the algorithm advances. The size of the gene generation interval shall be lower with the passing of
generations. In our algorithm, b is set to 0.5.

The property of combining these crossover and mutation operators can make a uniform search
in the initial space in early generations and very locally at a later stage, favoring local tuning. It
also greatly reduces the risk of premature convergence.
Our genetic algorithm here works with not one but multiple populations, all of which evolve
separately most of the time, except for once every several generations we applying a cross-over
operation from different populations. Since sometimes it could happen for a single population
scheme that though the neural network could theoretically solve a certain classification problem,
the system may not return a correct solution. This is because of the random nature of the algorithm
and its reliance on natural selection, mutation and cross-over. The other reason is the permutation
problem, which has been discussed by Hancock (1992). Thus, it could happen that a certain flow
of events that would lead to a correct solution will not occur and thus a solution will not be found.
However, by using several unrelated populations, we have decreased the probability of this
occurrence, since if some population has poor individuals the solution could still be found at
another.

2.3. Weight connections optimization using conjugate gradient descent algorithm


As we have discussed above, GA can be used effectively in the evolution to find a near-optimal
set of connection weights globally without computing gradient information and without weight
connections initialization. However, sometimes this is not directly achievable and the result could
be refined by a ordinary algorithm. For our purpose, we need to find the optimal connection
weights. Hence, we could try to incorporate a local search procedure into the evolution using the
conjugate gradient descent algorithm to find the best connection weights at the local error surface.
This procedure is completed by applying a BP algorithm on the GA established initial connection
weights.

The overall framework of our proposed method could be summarized as Figure 1. First, at the
initialization stage, the neural network structure, including number of input nodes, hidden nodes,
and output nodes are specified according to the specific classification application. Connection
weights corresponding to this MLP structure are encoded in GA’s chromosomes; each
chromosome represents one MLP structure with given connection weights contained in its genes.
Second, at the GA-based weight connection training stage, these initialized chromosomes which
may belong to different populations are evolved generation to generation by using GA according
to the fitness and MSE performance of the correspondent MLP. Finally, at the stage of local
optimization of the error surface with BP, best connection weights matrix contained in the
chromosome with best fitness is chosen as the MLP initial weight and applying the gradient
descent algorithm to optimize the best connection weights and minimize local errors.

Start
Input Layer Hidden Layers Output Layer

Initialization

Termination Yes Stop GA training


criterion satisfied?

No
Selection BP training

Crossover / Classification
Reproduction

End
Mutation
Chromosomes Weight Weight

Fig. 1 Framework of Combining Neural Network and GA for Classification

3. EXPERIMENTAL RESULTS AND DISCUSSION

In this section, we will discuss some experimental results with our classification methodology
and compared with other methods. Two experiments are employed for our discussion.

3.1. Simple example and comparison with other methods


A 900×700 pixels SPOT-4 XS high resolution imagery of Jiangning County in Jiangsu Province
in eastern China is used for classification (Figure 2a). There are three bands in SPOT-4 XS data:
green (0.50-0.59µm), red (0.61-0.68µm), and near infrared (0.79-0.89µm). Three types of
supervised classifiers: Maximum Likelihood Classifier (MLC), back propagation neural network
classifier (BP-MLP), and hybrid GA-based neural network classifier (GA-MLP), were employed.
A total of 96 samples belonging to three land cover classes were used for training. These three
classes were built-up and bare land, pond and river, vegetation. For the BP and GA neural network
structure, a three layer MLP with one hidden layer consisting of 8 hidden nodes was used. For
GA-MLP, we had two population groups and each population size was set to 100. The asexual
reproduction probability was set to 0.1, arithmetical crossover probability was set to 0.8,
non-uniform mutation probability was set to 0.07, single point random mutation probability was
set to 0.03, with ai set to –1.0, bi set to 1.0. GA-MLP was trained by 70 generations, followed by a
BP training procedure. For the back propagation training algorithm, the learning rate was set to
0.01, the learning rate incremental was set to 1.03, and target training performance was set to
0.01602 (actual training performance was 0.01601). Different land cover classification results are
illustrated in Figure 2b、2c、2d, respectively.

a. Original Pseudo-color Composite b. MLC

c. BP-MLP d. GA-MLP
Built-up and bare land River Vegetation

Fig. 2 Classification Results of SPOT-4 XS imagery of Jiangning County, Jiangsu, China

The hybrid training performance is shown in Figure 3. Figure 3a is the best MSE corresponding
to each generation during the evolutionary training. From this figure, we can see the variation of
best MSE represented by the best chromosome with the number of generations of the genetic
algorithm. After 20 generations of execution, the change of best MSE has become slower because
of the local tuning characteristic. Figure 3b is the MSE corresponding to each iteration during the
back propagation training. This figure shows that, although our genetic algorithm has greatly
reduced the total MSE of the neural network, a more improvement of training performance could
still be achieved by applying a back propagation weight adjustment procedure, high to 0.1.
MSE

MSE
Generation
a. Best MSE by evolutionary training b. MSE during BP training
Fig. 3 Neural network training performance for SPOT XS imagery of Jiangning County, Jiangsu, China

For this imagery, we randomly chose 200 pixels from the three classified land cover maps to
assess the classification accuracy. We compared these pixels with our interpretation results and the
user’s accuracy (Story et al., 1986) and kappa coefficient (Cohen, 1960; Congalton, 1991), which
are showed in Table 1.
TABLE 1
COMPARISON OF CLASSIFICATION ACCURACY

Land User’s Accuracy (unit: %) Kappa statistic

cover MLC BP-MLP GA-MLP MLC BP-MLP GA-MLP

Built-up,
65.39 70.83 88.89 0.583 0.649 0.866
bare land

Water 100.00 100.00 100.00 1.000 1.000 1.000

Vegetation 97.10 100.00 98.63 0.889 1.000 0.947

Overall 89.0 93.0 97.0 0.750 0.846 0.929

*
Note: overall accuracy and Kappa statistic is calculated by weights

From Table 1 we can see that, the MLP classifiers are more accurate than the maximum
likelihood classifier, the overall user’s accuracies for GA-MLP, BP-MLP and MLC are 97%, 93%,
89%, respectively. The reason for better accuracy with GA-MLP than BP-MLP is possibly because
of the better error residency when training the neural network using GA (0.0161 vs. 0.03). This
also proves that GA could find more globally optimal solution of a neural network.

3.2. More complicated experiment and discussion


An applicable classification algorithm can not only handle simple cases, but also more
complicated conditions. In the second experiment, our classified categories have increased to six.
More training samples are added and some of them are exclusive. This change makes our
experiment more close to normal real applications.
A 900x900 pixels CBERS (China-Brazil Earth Resources Satellite, launched in October 1999)
high resolution imagery of Shihezi County in Xinjiang Province in northwestern China was used
for classification (Figure 5a). CBERS’s nominal spatial resolution is 19.5m. There are six band in
CBERS datasets, but here only three bands are used for classification. These bands include green
band of wavelength 0.52-0.59µm (band 2), red band of wavelength 0.63-0.69µm (band 3), and near
infrared band of wavelength 0.77-0.89µm (band 4). A total of 280 samples belonging to six land
cover classes were used for training. These six classes are wheat field, pond and river, desert and
bare land, saline land, wet and irrigated land, and cotton field. The scatter plots of these 280
samples in Figure 4a,b shows the complexity of overlapping and nonlinear class boundaries. A
good classification algorithm should correctly identify the nonlinear and overlapping class
boundaries without much priori knowledge or assumption. For the BP and GA neural network
structures, a three layer MLP with one hidden layer consisting of 8 hidden nodes were used. For
GA-MLP, we have two population group and each population size was set to 150, asexual
reproduction probability was set to 0.1, arithmetical crossover probability was set to 0.8,
non-uniform mutation probability was set to 0.05, and the single point random mutation
probability was set to 0.05, with ai set to –5.0, bi set to 5.0. GA-MLP was trained by 300
generations, followed by a BP training procedure. For the back propagation training algorithm, the
learning rate was set to 0.01, and the learning rate incremental was set to 1.03, target training
performance was set to 0.059. The classified image is shown in Figure 5b. Assessing the classified
image with existing land cover maps have shown that the classification accuracy is over 87.3%,
nearly 3% higher than BP-MLP.

a. Band4 vs. Band3 b. Band3 vs Band2


Fig. 4 Scatter plots for a training set of CBERS imagery of Shihezi County

(a) Original CBERS pseudo color composite (b) Plot of classified result
Legend for the classified image: Cotton field wet and irrigated land Desert and bare
land Saline land Pond and river Wheat field
Fig. 5 Classification Results of CBERS imagery of Shihezi County

The hybrid training performance of this imagery is shown in Figure 6. Figure 6a is the best
MSE corresponding to each generation during the evolutionary training. Figure 6b is the MSE
corresponding to each iteration during the back propagation training. Same conclusions can be
deduced from these two figures just as in Figure 3.
MSE

MSE
Generation
a. Best MSE by evolutionary training b. MSE during BP training
Fig. 6 Neural network training performance for CBERS imagery of Shihezi County

To have an insightful observation of the change of the network structure and its mechanism
between the genetic evolved neural network and the network after applying a back propagation
training, we can have a comparison from Tables 2, 3, 4, and 5. Note that in these tables, there are
three input nodes representing CBERS Band4, Band3, and Band2, which are denoted as I1, I2 and
I3, respectively. The eight hidden nodes in the hidden layer are denoted as H1, H2, H3, H4, H5,
H6, H7, and H8, respectively. The six output nodes in the output layer are denoted as O1, O2, O3,
O4, O5, and O6, respectively, representing six different classes including wheat field, pond and
river, desert and bare land, saline land, wet and irrigated land, and cotton field.
Table 2 demonstrates the changes of weight connections between input layer and hidden layer.
As can be seen from the table, there is a little change in connection weights between input nodes
and H1, H4, H5, and H8, which have been labeled with asteroid. Note that, most of these
connection weights are larger than others in the corresponding columns, especially for I2, I1
(Band 3 and Band 4). This may reveal that Band 3 and Band 4 contain more information in the
neural network to classify training samples than the other two bands. These conclusions can also
be explained from Figure 4a and 4b.

TABLE 2
COMPARISON OF WEIGHT CONNECTIONS BETWEEN INPUT LAYER AND HIDDEN LAYER

Hidden GA Training Input Node BP Training Input Node

Node I1 I2 I3 I1 I2 I3

H1 0.0546* 0.0308* -0.0125* 0.0546* 0.0307* -0.0125*

H2 0.0319 -0.0625 0.0324 0.2659 -0.2614 0.0101


* *
H3 -0.0350 0.0379 0.0075 -0.1965 -0.0199 0.3421
* * * * *
H4 0.0584 0.0264 0.0200 0.0584 0.0264 0.0200*

H5 0.0846* 0.0748* 0.0268* 0.0846* 0.0748* 0.0268*

H6 0.0073 -0.0085* 0.0227 0.1507 0.0849* -0.3348


* *
H7 -0.0279 -0.0006 0.0671 -0.0647 0.2741 -0.2638
* * * * *
H8 0.0917 -0.0029 0.0572 0.0917 -0.0029 0.0572*
Compared to Table 2, there are no evident changing principles shown in Table 3. It seems that
genetic algorithm has successfully evolved the connection weights between the input layers and
hidden layers, especially those most important nodes containing maximum information like I1, H1,
H4, H5 and H8; while back propagation algorithm can finely tune the connection weights between
the input layers and hidden layers, and successfully adjust the connection weights between the
hidden layers and output layers.

TABLE 3
COMPARISON OF WEIGHT CONNECTIONS BETWEEN HIDDEN LAYER AND OUTPUT LAYER

Hidden GA Training Output Node BP Training Output Node

Node O1 O2 O3 O4 O5 O6 O1 O2 O3 O4 O5 O6

H1 -0.001 -0.0144 0.0271 -0.0135 0.0013 -0.0372 0.0561 0.0386 0.0044 0.0213 -0.0329 -0.0587

H2 0.0536 0.0078 -0.0039 -0.0296 0.041 0.042 -0.0418 0.0187 -0.2 -0.2363 0.3737 0.0951

H3 -0.0404 -0.0255 0.0167 0.0493 -0.0153 -0.0211 -0.1011 -0.0101 0.096 0.0059 0.3441 -0.325

H4 0.0521 0.0446 0.0114 0.0173 -0.0078 0.0445 0.1092 0.0976 -0.0113 0.0521 -0.042 0.023

H5 0.0234 0.0387 -0.0237 0.0217 0.015 -0.0208 0.0805 0.0917 -0.0464 0.0565 -0.0192 -0.0423

H6 0.0422 0.0122 0.0504 0.0563 0.0268 0.0173 -0.005 -0.3735 0.289 0.0074 0.0205 0.0905

H7 0.0172 0.0598 0.0166 0.0419 0.0027 0.071 -0.4048 -0.0101 -0.0123 -0.0068 0.0555 0.3922

H8 0.0611 0.0061 0.0828 0.0219 0.0583 0.0418 0.1182 0.0591 0.0601 0.0567 0.0241 0.0203

The idea implicated in this phenomena is that, since the genetic algorithm is a forward
stochastic optimization algorithm, the most accurate weight adjustment may occur in the first layer,
that is the connection weights between the input layer and the hidden layer; and since the back
propagation algorithm is a backward optimization algorithm, the most accurate weight adjustment
may occur at the last layer, that is, the connection weights between the output layer and the hidden
layer. What we can expect from this is that, if combining these two algorithms, higher accuracy of
weight connection adjustment and classification results can be achieved. This idea is also reflected
in the bias adjustment shown in Tables 4 and 5.

TABLE 4
COMPARISON OF BIASES IN HIDDEN LAYER

Hidden GA Training BP Training

Node Bias Bias


*
H1 0.0547 0.0547*

H2 0.0077 0.0154

H3 -0.0137 0.0099
*
H4 0.0394 0.0394*

H5 0.0192* 0.0192*

H6 0.0305 0.0054

H7 -0.0073 -0.0383
*
H8 0.0400 0.0400*
TABLE 5
COMPARISON OF BIASES IN OUTPUT LAYER

GA Training Output Node BP Training Output Node

O1 O2 O3 O4 O5 O6 O1 O2 O3 O4 O5 O6

Bias 0.0408 -0.0034 0.0215 0.0048 0.0794 0.0412 0.0979 0.0496 -0.0012 0.0396 0.0452 0.0197

4. CONCLUSION AND FUTURE WORK

In this article, we have discussed the advantages and the key issues of the genetic algorithm
evolved neural network classifier in detail. Our methodology adopts a real coded GA strategy and
hybrid with back propagation algorithm. The genetic operators are carefully designed to optimize
the neural network, avoiding premature convergence and permutation problems. A SPOT-4 XS
imagery was used to test our algorithm. Preliminary research has shown that a hybrid GA
algorithm-based neural network classifier can have better overall accuracy on high resolution land
cover classification. Our experiment and discussion on the CBERS data have showed that
carefully designed genetic algorithm-based neural network outperformsgradient descent-based
neural network. This has been supported by the analysis of the changes of connection weights and
biases of the neural network.
One problem when considering the combination of neural network and genetic algorithm for
land cover classification is the determination of the optimal neural network topology, i.e., which is
the best neural network structure for a specific training samples and remote sensing imagery. Our
neural network topology described in this experiment is determined manually. A substitute method
is to apply the genetic algorithm for neural network structure optimization, which will be a part of
our future work.

5. ACKNOWLEDGEMENT

This work was supported by the grant of the Knowledge Innovation Program of the Chinese
Academy of Sciences (approved # KZCX1-SW-01), and subsidized by the China’s Special Funds
for Major State Fundamental Research Project (G2000077900)

REFERENCES
[1] Bandyopadhyay, S., Pal, S. K., 2001. Pixel classification using variable string genetic algorithms with
chromosome differentiation. IEEE Transaction on Geoscience and Remote Sensing, 39(2): 303-308
[2] Benediktsson, J. A., Swain, P. H., Ersoy, O. K., 1990. Neural Network approaches versus statistical methods
in classification of multisource remote sensing data. IEEE Transaction on Geoscience and Remote Sensing,
28(4): 540-552.
[3] Carpenter G. A., Gjiaja M. N., Gopal S., and Woodcock C. E., 1997. ART neural networks for remote sensing:
Vegetation classification from Landsat TM and terrain data. IEEE Transaction on Geoscience and Remote
Sensing, 35(2): 308-325.
[4] Carpenter G. A., Gopal S., Macomber S., Martens Siegfriens, and Woodcock C. E., 1999. A neural network
method for mixture estimation for vegetation mapping. Remote Sensing of Environment, 70: 138-152.
[5] Cohen, J., 1960. A coefficient of agreement for nominal scales, Educ. Psychol. Measurement. 20(1): 37-46.
[6] Congalton, R. G., 1991. A Review of Assessing the Acurracy of Classifications of Remotely Sensed Data.
Remote Sensing of Environment, 37: 35-46
[7] Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley,
New York.
[8] Gopal S., Woodcock C. E., Strahler, A. H., 1999. Fuzzy neural network classification of global land cover
from a 1 degree AVHRR data set. Remote Sensing of Environment, 67: 230-243.
[9] Hancock P. J. B., 1992. Genetic algorithms and permutation problems: a comparison of recombination
operators for neural net structure specification. Proceedings of the Int. Workshop on Combinations of
Genetic Algorithms and Neural Networks (COGANN-92), pp. 108-122. IEEE Computer Society Press, Los
Alamitos, CA.
[10] Herrera F., Lozano M., Verdegay J. L., Tackling Real-Coded Genetic Algorithms: Operators and Tools for
Behavioural Analysis. NEC Research Index. http://citeseer.nj.nec.com/
[11] Hertz J., Krogh A., Palmer R., 1991. A Introduction to the Theory of Neural Computation. Addison-Wesley,
Readings, CA.
[12] Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. The University of Michigan Press.
[13] Hush D. R., Horne B. G., 1993. Progress in supervised neural networks. IEEE Signal Processing Magazine,
10(1):8-39.
[14] Michalewicz Z., 1992. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, New
York.
[15] MathWorks, Inc., 1997. Neural Network Toolbox User’s Guide. MathWorks, Inc., Natick, MA.
[16] Nilsson, N. J., 1998. Artificial Intelligence: A New Synthesis. Morgan Kaufmann Publishers, Inc., San
Francisco, CA.
[17] Pal S. K., Bandyopadhyay S., Murthy C. A., 2001. Genetic classifier for remotely sensed images:
comparison with standard methods. International Journal of Remote Sensing, 22(13): 2545-2569
[18] Rumelhart, D. E., Hinton, G. E., and Williams, R.J., 1986. Learning representations by back-propagating
errors. Nature, 323: 533-536.
[19] Story, M. and Congalton, R. (1986). Accuracy assessment: a user’s perspective. Photogrammetry
Engineering and Remote Sensing, 52(3): 397-399.
[20] Townshend J., Justice C., Li W., Gurney C., McManus J., 1991. Global Land Cover Classification by Remote
Sensing: Present Capabilities and Future Possibilities. Remote Sensing of Environmnet, 35: 243-255.
[21] Yao X., 1999. Evolving artificial neural networks. Proceedings of the IEEE, 87(9): 1423-1447.

Você também pode gostar