Escolar Documentos
Profissional Documentos
Cultura Documentos
LARSIS, the Institute of Remote Sensing Applications, Chinese Academy of Sciences, Beijing 100101, China
Abstract: This paper investigates the effectiveness of the genetic algorithm evolved neural network classifier and
its application to the land cover classification of remotely sensed multispectral imagery. First, the key issues of the
algorithm and the general procedures are described in detail. Our methodology adopts a real coded GA strategy
and hybrid with a back propagation (BP) algorithm. The genetic operators are carefully designed to optimize the
neural network, avoiding premature convergence and permutation problems. Second, a SPOT-4 XS imagery is
employed to evaluate its accuracy. Traditional classification algorithms, such as maximum likelihood classifier,
back propagation neural network classifier, are also involved for a comparison purpose. Based on an evaluation of
the user’s accuracy and kappa statistic of different classifiers, the superiority of applying the discussed genetic
algorithm-based classifier for simple land cover classification using multi-spectral imagery data is established.
Thirdly, a more complicate experiment on CBERS (China-Brazil Earth Resources Satellite) data and discussion
also demonstrates that carefully designed genetic algorithm-based neural network outperforms than gradient
descent-based neural network. This has been supported by the analysis of the changes of connection weights and
biases of the neural network. Finally, some concluding remarks and suggestions are also presented.
Keywords: Genetic Algorithm Land Cover Classification Neural Network Remote Sensing
1. INTRODUCTION
So far, several pattern recognition algorithms have been adopted in remote sensing land cover
classification (Townshend et al., 1991), including some newly developed supervised classification
method, for example, the Fuzzy ARTMAP Classifier (Carpenter, et al., 1997, 1999; Gopal, et al.,
1999), and the Genetic Classifier (Bandyopadhyay and Pal, 2001; Pal, et al., 2001). Within these
methods, the neural network classifier and some other intelligent methods have been recognized to
be the most promising algorithms.
The classification of remotely sensed datasets using artificial neural networks first appeared in
remote sensing literature about ten years ago (Benediktsson et al., 1990). Since then, examples
and applications at different scales and with different data sources have become increasingly
common. In nearly all cases, the neural network classifier has proved its superiority to traditional
classifiers, usually with 10-20% overall accuracy improvements.
The most widely used neural network model is the multi-layer percepton (MLP), in which the
connection weight training is normally completed by a back propagation learning algorithm
(Rumelhart et al., 1986). The idea of weight training in MLPs is usually formulated as
minimization of an error function, such as mean square error (MSE) between target and actual
outputs averaged over all examples, by iteratively adjusting connection weights. One of the
essential characteristics of the back propagation algorithm is gradient descent, which has been
discussed in many textbooks (Nilsson, 1998) and software manuals (MathWorks Inc., 1997).
Despite its popularity as an optimization tool for neural network training, the gradient descent
technique also has several drawbacks. For instance, the performance of the network learning is
strictly dependent on the shape of the error surface, values of the initial connection weights, and
some further sophisticate parameters. A common error surface may have many local minima,
multimodal and/or nondifferentiable, which is usually not very easy to meet the desired
convergence criterion. This typically makes the gradient descent-based algorithm stuck in some
local minimum when moving across the error surface. Another shortcoming is the efficiency of
differential operation. Multilayer networks typically use sigmoid transfer functions in the hidden
layers. These functions are often called "squashing" functions, since they compress an infinite
input range into a finite output range. Sigmoid functions are characterized by the fact that their
slope must approach zero as the input gets large. This may result in a problem when using steepest
descent to train a multilayer network with sigmoid functions, since the magnitude of gradient may
change more and more smaller when the input becomes larger, therefore making tiny changes in
the weights and biases, even though the weights and biases are far from their optimal values.
Of course there are some approaches to prevent the gradient descent algorithm from becoming
stuck in any local minimum when moving across the error surface (Hertz et al., 1991). However,
all these are not able to really overcome the many existing problems(Hush and Horne, 1993; Hertz
et al., 1991).
On the other side, Genetic Algorithms (Goldberg, 1989) offer an efficient search method for a
complex problem space and can be used as powerful optimization tools. With regard to the
above-mentioned problems of the gradient descent a complete substitution of them by a GA might
be advantageous.
Recently some investigations into neural network training using Genetic Algorithms have been
published (Yao, 1999). Often only selective problems with their particular solution are in the focus
of attention. There is still few article concerning evolutionary neural network classifier for remote
sensing land cover classification that has been published in the literature.
With GAs, we can formulate the neural network training process as the evolution of connection
weights in the environment determined by the architecture and the learning task. Potential single
individual solutions (which are chromosomes in terms of GAs) to a problem compete with each
other through selection, crossover, and mutation operations in order to achieve increasingly better
results. With this strategy, GA can then be used effectively in the evolution to find a near-optimal
set of connection weights globally without computing gradient information.
During the weight training and adjusting process, the fitness functions of an neural network can
be defined by considering two important factors: the error between target and actual outputs and
complexity of the neural network. Unlike the case in gradient-descent-based training algorithms,
the fitness (or error) function does not have to be differentiable or even continuous since GAs do
not depend on gradient information. Therefore, GAs can handle large, complex, nondifferentiable
and multimodal spaces, which are the typical cases in remote sensing classification and many
other real world applications.
This paper demonstrates a method that uses GA to train the neural network for land cover
classification. The outline of this paper is as follows. First, in Section 2, we introduce the
multi-layer feed-forward neural network model, the genetic algorithm, and methodology to hybrid
real coded GA with a back propagation algorithm for neural network training. Next, in Section 3,
two experiments, including a simple land cover classification and comparison with SPOT-4 XS
data using our hybrid evolutionary neural network classifier and other classifiers, and a more
complicated experiment on CBERS data and its analysis are presented. Finally in Section 4, some
conclusions are reached and future works are also proposed.
2. METHODOLOGY
⎛ m ⎞
zh = f (W T X ) = f ⎜ ∑ ωi xi − δ h ⎟ (1)
⎝ i =1 ⎠
⎛ l ⎞
oq = p (V T Z ) = p ⎜ ∑ vi zi − δ q ⎟ (2)
⎝ i =1 ⎠
respectively, where the superscript T stands for a vector transpose, W = [ω1,ω2, …,ωi, …, ωm]
is the weight connection vector between the input nodes and hidden node h, V = [v1, v2, …, vi, …,
vk] is the weight connection vector between the hidden nodes and output node q, X= [x1, x2, …,
xi, …, xm] is the input vector for each hidden node, and Z= [z1, z2, …, zi, …, zm] is the output
vector of the hidden nodes. δh and δq are the corresponding biases for hidden node h and output
node q. zh and oq are the output neuron responses for node h and node q, respectively. The Sigmoid
function f(x) is defined as:
1
f ( x) = (3)
1 + e−x
where x∈[-∞, +∞].And the Purelin function p(x) is defined as:
p ( x) = α x + β (4)
1 n k
ε (net ) = ∑∑
n ⋅ k i =1 j =1
(tij − oij )2 (5)
⎧ci + ∆(t , bi − ci ) if τ = 0
ci' = ⎨
⎩ci − ∆(t , ci − a i ) if τ = 1 (9)
t b
(1− )
∆(t , y ) = y (1 − r g max
)
where τ is a random binary number having value 0 or 1, and b is a parameter chosen by the user,
which determines the degree of dependency on the number of iterations. This function gives a
value in the range [0, y] such that the probability of returning a number close to zero increases as
the algorithm advances. The size of the gene generation interval shall be lower with the passing of
generations. In our algorithm, b is set to 0.5.
The property of combining these crossover and mutation operators can make a uniform search
in the initial space in early generations and very locally at a later stage, favoring local tuning. It
also greatly reduces the risk of premature convergence.
Our genetic algorithm here works with not one but multiple populations, all of which evolve
separately most of the time, except for once every several generations we applying a cross-over
operation from different populations. Since sometimes it could happen for a single population
scheme that though the neural network could theoretically solve a certain classification problem,
the system may not return a correct solution. This is because of the random nature of the algorithm
and its reliance on natural selection, mutation and cross-over. The other reason is the permutation
problem, which has been discussed by Hancock (1992). Thus, it could happen that a certain flow
of events that would lead to a correct solution will not occur and thus a solution will not be found.
However, by using several unrelated populations, we have decreased the probability of this
occurrence, since if some population has poor individuals the solution could still be found at
another.
The overall framework of our proposed method could be summarized as Figure 1. First, at the
initialization stage, the neural network structure, including number of input nodes, hidden nodes,
and output nodes are specified according to the specific classification application. Connection
weights corresponding to this MLP structure are encoded in GA’s chromosomes; each
chromosome represents one MLP structure with given connection weights contained in its genes.
Second, at the GA-based weight connection training stage, these initialized chromosomes which
may belong to different populations are evolved generation to generation by using GA according
to the fitness and MSE performance of the correspondent MLP. Finally, at the stage of local
optimization of the error surface with BP, best connection weights matrix contained in the
chromosome with best fitness is chosen as the MLP initial weight and applying the gradient
descent algorithm to optimize the best connection weights and minimize local errors.
Start
Input Layer Hidden Layers Output Layer
Initialization
No
Selection BP training
Crossover / Classification
Reproduction
End
Mutation
Chromosomes Weight Weight
In this section, we will discuss some experimental results with our classification methodology
and compared with other methods. Two experiments are employed for our discussion.
c. BP-MLP d. GA-MLP
Built-up and bare land River Vegetation
The hybrid training performance is shown in Figure 3. Figure 3a is the best MSE corresponding
to each generation during the evolutionary training. From this figure, we can see the variation of
best MSE represented by the best chromosome with the number of generations of the genetic
algorithm. After 20 generations of execution, the change of best MSE has become slower because
of the local tuning characteristic. Figure 3b is the MSE corresponding to each iteration during the
back propagation training. This figure shows that, although our genetic algorithm has greatly
reduced the total MSE of the neural network, a more improvement of training performance could
still be achieved by applying a back propagation weight adjustment procedure, high to 0.1.
MSE
MSE
Generation
a. Best MSE by evolutionary training b. MSE during BP training
Fig. 3 Neural network training performance for SPOT XS imagery of Jiangning County, Jiangsu, China
For this imagery, we randomly chose 200 pixels from the three classified land cover maps to
assess the classification accuracy. We compared these pixels with our interpretation results and the
user’s accuracy (Story et al., 1986) and kappa coefficient (Cohen, 1960; Congalton, 1991), which
are showed in Table 1.
TABLE 1
COMPARISON OF CLASSIFICATION ACCURACY
Built-up,
65.39 70.83 88.89 0.583 0.649 0.866
bare land
*
Note: overall accuracy and Kappa statistic is calculated by weights
From Table 1 we can see that, the MLP classifiers are more accurate than the maximum
likelihood classifier, the overall user’s accuracies for GA-MLP, BP-MLP and MLC are 97%, 93%,
89%, respectively. The reason for better accuracy with GA-MLP than BP-MLP is possibly because
of the better error residency when training the neural network using GA (0.0161 vs. 0.03). This
also proves that GA could find more globally optimal solution of a neural network.
(a) Original CBERS pseudo color composite (b) Plot of classified result
Legend for the classified image: Cotton field wet and irrigated land Desert and bare
land Saline land Pond and river Wheat field
Fig. 5 Classification Results of CBERS imagery of Shihezi County
The hybrid training performance of this imagery is shown in Figure 6. Figure 6a is the best
MSE corresponding to each generation during the evolutionary training. Figure 6b is the MSE
corresponding to each iteration during the back propagation training. Same conclusions can be
deduced from these two figures just as in Figure 3.
MSE
MSE
Generation
a. Best MSE by evolutionary training b. MSE during BP training
Fig. 6 Neural network training performance for CBERS imagery of Shihezi County
To have an insightful observation of the change of the network structure and its mechanism
between the genetic evolved neural network and the network after applying a back propagation
training, we can have a comparison from Tables 2, 3, 4, and 5. Note that in these tables, there are
three input nodes representing CBERS Band4, Band3, and Band2, which are denoted as I1, I2 and
I3, respectively. The eight hidden nodes in the hidden layer are denoted as H1, H2, H3, H4, H5,
H6, H7, and H8, respectively. The six output nodes in the output layer are denoted as O1, O2, O3,
O4, O5, and O6, respectively, representing six different classes including wheat field, pond and
river, desert and bare land, saline land, wet and irrigated land, and cotton field.
Table 2 demonstrates the changes of weight connections between input layer and hidden layer.
As can be seen from the table, there is a little change in connection weights between input nodes
and H1, H4, H5, and H8, which have been labeled with asteroid. Note that, most of these
connection weights are larger than others in the corresponding columns, especially for I2, I1
(Band 3 and Band 4). This may reveal that Band 3 and Band 4 contain more information in the
neural network to classify training samples than the other two bands. These conclusions can also
be explained from Figure 4a and 4b.
TABLE 2
COMPARISON OF WEIGHT CONNECTIONS BETWEEN INPUT LAYER AND HIDDEN LAYER
Node I1 I2 I3 I1 I2 I3
TABLE 3
COMPARISON OF WEIGHT CONNECTIONS BETWEEN HIDDEN LAYER AND OUTPUT LAYER
Node O1 O2 O3 O4 O5 O6 O1 O2 O3 O4 O5 O6
H1 -0.001 -0.0144 0.0271 -0.0135 0.0013 -0.0372 0.0561 0.0386 0.0044 0.0213 -0.0329 -0.0587
H2 0.0536 0.0078 -0.0039 -0.0296 0.041 0.042 -0.0418 0.0187 -0.2 -0.2363 0.3737 0.0951
H3 -0.0404 -0.0255 0.0167 0.0493 -0.0153 -0.0211 -0.1011 -0.0101 0.096 0.0059 0.3441 -0.325
H4 0.0521 0.0446 0.0114 0.0173 -0.0078 0.0445 0.1092 0.0976 -0.0113 0.0521 -0.042 0.023
H5 0.0234 0.0387 -0.0237 0.0217 0.015 -0.0208 0.0805 0.0917 -0.0464 0.0565 -0.0192 -0.0423
H6 0.0422 0.0122 0.0504 0.0563 0.0268 0.0173 -0.005 -0.3735 0.289 0.0074 0.0205 0.0905
H7 0.0172 0.0598 0.0166 0.0419 0.0027 0.071 -0.4048 -0.0101 -0.0123 -0.0068 0.0555 0.3922
H8 0.0611 0.0061 0.0828 0.0219 0.0583 0.0418 0.1182 0.0591 0.0601 0.0567 0.0241 0.0203
The idea implicated in this phenomena is that, since the genetic algorithm is a forward
stochastic optimization algorithm, the most accurate weight adjustment may occur in the first layer,
that is the connection weights between the input layer and the hidden layer; and since the back
propagation algorithm is a backward optimization algorithm, the most accurate weight adjustment
may occur at the last layer, that is, the connection weights between the output layer and the hidden
layer. What we can expect from this is that, if combining these two algorithms, higher accuracy of
weight connection adjustment and classification results can be achieved. This idea is also reflected
in the bias adjustment shown in Tables 4 and 5.
TABLE 4
COMPARISON OF BIASES IN HIDDEN LAYER
H2 0.0077 0.0154
H3 -0.0137 0.0099
*
H4 0.0394 0.0394*
H5 0.0192* 0.0192*
H6 0.0305 0.0054
H7 -0.0073 -0.0383
*
H8 0.0400 0.0400*
TABLE 5
COMPARISON OF BIASES IN OUTPUT LAYER
O1 O2 O3 O4 O5 O6 O1 O2 O3 O4 O5 O6
Bias 0.0408 -0.0034 0.0215 0.0048 0.0794 0.0412 0.0979 0.0496 -0.0012 0.0396 0.0452 0.0197
In this article, we have discussed the advantages and the key issues of the genetic algorithm
evolved neural network classifier in detail. Our methodology adopts a real coded GA strategy and
hybrid with back propagation algorithm. The genetic operators are carefully designed to optimize
the neural network, avoiding premature convergence and permutation problems. A SPOT-4 XS
imagery was used to test our algorithm. Preliminary research has shown that a hybrid GA
algorithm-based neural network classifier can have better overall accuracy on high resolution land
cover classification. Our experiment and discussion on the CBERS data have showed that
carefully designed genetic algorithm-based neural network outperformsgradient descent-based
neural network. This has been supported by the analysis of the changes of connection weights and
biases of the neural network.
One problem when considering the combination of neural network and genetic algorithm for
land cover classification is the determination of the optimal neural network topology, i.e., which is
the best neural network structure for a specific training samples and remote sensing imagery. Our
neural network topology described in this experiment is determined manually. A substitute method
is to apply the genetic algorithm for neural network structure optimization, which will be a part of
our future work.
5. ACKNOWLEDGEMENT
This work was supported by the grant of the Knowledge Innovation Program of the Chinese
Academy of Sciences (approved # KZCX1-SW-01), and subsidized by the China’s Special Funds
for Major State Fundamental Research Project (G2000077900)
REFERENCES
[1] Bandyopadhyay, S., Pal, S. K., 2001. Pixel classification using variable string genetic algorithms with
chromosome differentiation. IEEE Transaction on Geoscience and Remote Sensing, 39(2): 303-308
[2] Benediktsson, J. A., Swain, P. H., Ersoy, O. K., 1990. Neural Network approaches versus statistical methods
in classification of multisource remote sensing data. IEEE Transaction on Geoscience and Remote Sensing,
28(4): 540-552.
[3] Carpenter G. A., Gjiaja M. N., Gopal S., and Woodcock C. E., 1997. ART neural networks for remote sensing:
Vegetation classification from Landsat TM and terrain data. IEEE Transaction on Geoscience and Remote
Sensing, 35(2): 308-325.
[4] Carpenter G. A., Gopal S., Macomber S., Martens Siegfriens, and Woodcock C. E., 1999. A neural network
method for mixture estimation for vegetation mapping. Remote Sensing of Environment, 70: 138-152.
[5] Cohen, J., 1960. A coefficient of agreement for nominal scales, Educ. Psychol. Measurement. 20(1): 37-46.
[6] Congalton, R. G., 1991. A Review of Assessing the Acurracy of Classifications of Remotely Sensed Data.
Remote Sensing of Environment, 37: 35-46
[7] Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley,
New York.
[8] Gopal S., Woodcock C. E., Strahler, A. H., 1999. Fuzzy neural network classification of global land cover
from a 1 degree AVHRR data set. Remote Sensing of Environment, 67: 230-243.
[9] Hancock P. J. B., 1992. Genetic algorithms and permutation problems: a comparison of recombination
operators for neural net structure specification. Proceedings of the Int. Workshop on Combinations of
Genetic Algorithms and Neural Networks (COGANN-92), pp. 108-122. IEEE Computer Society Press, Los
Alamitos, CA.
[10] Herrera F., Lozano M., Verdegay J. L., Tackling Real-Coded Genetic Algorithms: Operators and Tools for
Behavioural Analysis. NEC Research Index. http://citeseer.nj.nec.com/
[11] Hertz J., Krogh A., Palmer R., 1991. A Introduction to the Theory of Neural Computation. Addison-Wesley,
Readings, CA.
[12] Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. The University of Michigan Press.
[13] Hush D. R., Horne B. G., 1993. Progress in supervised neural networks. IEEE Signal Processing Magazine,
10(1):8-39.
[14] Michalewicz Z., 1992. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, New
York.
[15] MathWorks, Inc., 1997. Neural Network Toolbox User’s Guide. MathWorks, Inc., Natick, MA.
[16] Nilsson, N. J., 1998. Artificial Intelligence: A New Synthesis. Morgan Kaufmann Publishers, Inc., San
Francisco, CA.
[17] Pal S. K., Bandyopadhyay S., Murthy C. A., 2001. Genetic classifier for remotely sensed images:
comparison with standard methods. International Journal of Remote Sensing, 22(13): 2545-2569
[18] Rumelhart, D. E., Hinton, G. E., and Williams, R.J., 1986. Learning representations by back-propagating
errors. Nature, 323: 533-536.
[19] Story, M. and Congalton, R. (1986). Accuracy assessment: a user’s perspective. Photogrammetry
Engineering and Remote Sensing, 52(3): 397-399.
[20] Townshend J., Justice C., Li W., Gurney C., McManus J., 1991. Global Land Cover Classification by Remote
Sensing: Present Capabilities and Future Possibilities. Remote Sensing of Environmnet, 35: 243-255.
[21] Yao X., 1999. Evolving artificial neural networks. Proceedings of the IEEE, 87(9): 1423-1447.