Você está na página 1de 9

Comparison of neural networks and linear regression to estimate

site productivity in forest plantations, using geomatic

INTRODUCTION

The geomática is a field of activities that using a systemic approach integrates all the
means to acquire and to manage space data required as part of the scientific,
administrative, legal and technical activities that worry about the production and
handling of space information (Canadian Institute of Geomática 2000). One can affirm
that one of the strengths that possesses the geomática is referred to the capacity to
combine information of different sources, the one which, summarizing appropiately, it
is able to generate information that he/she doesn't prepare with easiness in principle
(Lowell 1999).

On the other hand, the productive capacity of a certain place is known as place quality,
where I siege it is formed by a complex of factors bióticos and abióticos and its quality
is in function of the environmental factors related with the floor, climate, topography,
among other (Álvarez and Ruiz 1995). therefore, the productivity of the forest places
can be defined as the maximum wooden volume that one can obtain in a place and
determined time. This parameter can be expressed by means of the general pattern: P
= ƒ (C, R, S, G, V, TO, M, T), where P: place productivity, C: climate, R: relief, S:
factors associated to the floor, G: genetic quality, V: it structures of the vegetable
community, TO: animals, M: it influences human, and T: time (Gerding and Schlatter
1995).

The suitable functional relationship previously, P = ƒ (environmental Variables and


time), he/she gives origin to models of productivity by means of the employment of
models of multiple lineal regression (Green et to the one. 1989, Klinka and Crankcase
1990, Rodrigue 1997).

A net artificial neuronal" (RNA) it is constituted by a collection of prosecution elements


(nodes or neurons) highly interconnected that it transforms a group of entrance data in
a wanted group of exit data (Iost and Rivera 1993, Zhou and Zivco 1996). it is an
estimate technique and classification belonging to the artificial intelligence that he/she
has like principle the learning process that he/she tries to simulate the behavior
cognitiva of the human brain (Freeman and Skapura 1993).

Rath (1999) and Schultz et to the one. (1999) they point out that a RNA is constituted
by layers or layers of information, where it can generally be distinguished an entrance
layer (independent variables), an or several intermediate layers or you hide (that carry
out the determination of the relationships between the entrance variables and exit)
and an exit layer that he/she receives the resultant of the independent variables (it
figures 1).

The objective of the present investigation was to establish a methodology to estimate


and to compare the place productivity in forest plantations, using technical of lineal
regression and nets neuronales where is related an index of productivity (dominant
height of the trees to the 6 years of age) with obtained environmental variables of a
database sustained by a system of geographical information (SIG).
Figura 1. Artificial neural network.

METHODS
Study area. The study area considered for the realization of the present investigation
corresponded to the property The Picazo belonging to the University of Talca. It is
located in the VII Region of the Maule, commune of San Clemente, among the 35º 31 '
19 '' at 35º 23 ' 19 '' of south latitude and the 71º 08 ' 45 '' at 71º 12 ' 49 '' of
longitude west, having a surface of 1.422 has (it figures 2).

Taking of data. Of the available forest inventory for the considered area the variable
dependent total height was obtained (m) for the 100 higher trees for hectare, contents
in a circular parcel 500 square meters. A systematic net of parcels settled down with
an intensity of sampling of a parcel each three hectares. A total of 211 parcels was
measured in the property.

The information cartographic digital employee considered those variables that had
some incidence in the behavior of the place productivity in plantations of Pinus radiata
D. Don. The selected space variables of the database were: net hidrográfica, curved of
level (equidistancia of 10 m) and geographical location of the parcels. These coverings
are in vectorial format and they come from an optic-digital restitution of pictures air
vertical pancromáticas, it climbs 1:20.000, achieving a level of final detail 1:5.000.
starting from the previous space information the data were generated to use in the
models of place productivity, including digital model of elevations (MDE), of slopes
(MDP), of orientations (MDO) and it distances to the courses of water (DCA).
Figura 2. Study zone. El Picazo farm.

Generation of the digital planes. The generation of the used digital cartography
contemplated the realization of the following phases:
i. Construction of the digital pattern of elevations (MDE). One in the most common
ways of obtaining a MDE in format raster is the lineal interpolation of curved of level.
To carry out this process, the software IDRISI is been worth of the maximum value of
the slope of the neighboring píxeles to calculate the value of the elevation. The
validation of the MDE was carried out by means of a comparison of the mensurations
carried out in land with a receiving cartographic GPS and the dear ones for the pattern.
This procedure threw a RMS (Root Square Urinates) of 2,79 m.
ii. Construction of the derived models. For the construction of the digital pattern of
slopes (MDP) the system determines the maximum value of the slopes included in a
mobile window of 3x3, assigning it to the central cell. For the digital pattern of
orientations (MDO), the exhibitions of the land were calculated in sexagesimal degrees
and in the sense of the pointers of the clock. To the plane sectors the algorithm
assigns him the thematic value -1.
iii. It distances to the courses of water (DCA). The variable was generated in the
software raster, where the value of each cell was calculated as the distance euclidiana
between this píxel and those of reference.

The cartographic information mentioned previously will serve as base for the
construction of the models of place productivity by means of technical of lineal
regression and nets neuronales. This information was obtained by means of the
realization of a process of restitution optic-digital fotogramétrica that according to that
exposed by López and Ore (1999) it is the technique that presents a bigger
dependability for the generation and cartographic bring up to date.

Generation of the models of place productivity by means of multiple lineal regression.


Starting from the method ordinary square minima, they were generated curved that
explain the behavior of the place productivity in forest plantations. For the construction
of the pattern 70% of the available data was used and it stops its validation 30
remaining%. It is convenient to point out that the great majority of the carried out
analyses is based on the denominated use of a statistical one value-p or P-value, which
corresponds to the probability of accepting the null hypothesis, compared with the
significancia level to (it was used to = 0,05). next the logical sequence is presented to
make the analysis of multiple lineal regression:

i. Selection of variables. Because one doesn't know which the influential variables are
in the place productivity, a womb of correlations was used based on the calculation of
the coefficient of correlation of Pearson, to establish a first approach on the
determination of independent variables (Vallejos 1999). The final selection of the
explanatory variables of the pattern was carried out using the algorithm Forward
Selection implemented in the software Statgraphics Bonus.
ii. Regression analysis. In this phase it was determined the value of the coefficients
and intercepción of the regression straight line. Also, the statistical ones were obtained
basic to analyze the pattern: coefficient of determination (R2), adjusted coefficient of
determination (Ra2), the standard error of the estimate and the contribution of each
variable in the pattern (Canavos 1988).

iii. Suppositions about the error. Anderson et to the one. (1999) they point out that it
is necessary to verify three basic suppositions contemplated in the method of ordinary
square minima: normality, mediates similar to zero and homocedasticidad. Normality,
by means of the test Kolmogorov-Smirnov (K-S), appropriate for the study of
continuous aleatory variables; this analysis was supplemented by means of the graph
of normal probability and the histogram of frequencies of the errors (Cook and
Weisberg 1999). he/she Mediates similar to zero: the supposition was proven that it
evaluates if the half value of the residuals is similar to zero (Vallejos 1999). For it, the
following hypothesis test was used: H0: the stocking of the observations is similar to
zero. There is: the stocking of the observations is different from zero.
Homocedasticidad: this supposition suggests that the variances of the independent
variable are similar. To determine it the test of Bartlett it was used (Gujarati 1996).

Generation of models of place productivity by means of nets artificial neuronales.


Starting from the algorithm of nets artificial neuronales they settled down the models
that explain the behavior of the place productivity in plantations of Pinus radiata, using
topographical and environmental variables. For such effects the software Pathfinder
Neural Networks SystemTM was used. The entrance of the data to the pattern required
of a normalization, it is worth to say, the values were climbed between zero and one.
The construction and validation of the pattern neuronal was carried out by means of
the realization of the following stages:

i. Phase of training. The phase contemplated the use of 45% of the data, with the
purpose of calculating the pesos sinápticos of the connections. Several architectures of
the net neuronal were proven, only using a hidden layer or it intermediates inside the
configuration, since the used software doesn't allow to enlarge the quantity of layers.
The algorithm used for the process of training was that of retropropagación of the
error, considering a function of transfer sigmoidea. For the selection of the variables
that you/they explain the pattern and the recognition of the points of it breaks, he/she
was carried out the graph of the RMSE and the coefficient of correlation of Pearson,
quantifying the contribution from each variable to the pattern. To avoid the
sobreentrenamiento risk a detention approach it was adopted based on the analysis of
the graph of the RMSE of the control curve and of training.
ii. Test phase. 10% of the remaining data was used, in order to determining the
opportune moment when the process of training can stop, defining the final womb of
pesos sinápticos later on.

iii. Validation phase. The validation of the pattern neuronal was carried out using 15
remaining% of the data and once the training has concluded. Their objective is to
measure the quality of the pattern that has intended and to verify if it was able to
generalize the data in the learning process. As statistical of validation the RMSE and
the coefficient of correlation of Pearson were used.

Comparison of the models of place productivity. The generated models of productivity


were validated with those data non employees for their construction, using for this task
30% of the available information. The indexes considered for the comparison of the
models were the RMSE, the EMA (half absolute error) and the SM (half bias), this last
one used to analyze the sense of the deviations (it has more than enough or
underestimate).

Application of the models of place productivity. Later to the evaluation and validation
of the models, these were implemented to the interior of a system of geographical
information (IDRISI), being obtained thematic maps of place productivity for
plantations of Pinus present radiata in the study area.

RESULTS
Estimate for the method of square minima. Next the obtained results of the models of
multiple lineal regression are described:
Correlation analysis. In the square 1 the correlations of Pearson are presented with
regard to the dependent variable (place productivity), represented by the height
average of the 100 higher trees by hectare to the 6 years of age (H_100). it represents
It the exploratory analysis of the independent variables in order to observing their
influence on the place productivity.

Cuadro 1. Correlation analysis.

Variable Coeficiente de correlación (Valor-P)

MDE (Modelo digital de elevaciones) 0,1452 (0,0804)


MDO (Modelo digital de orientaciones) –0,1198 (0,1499)
MDP (Modelo digital de pendientes) –0,4849 (0,0000)
DCA (Distancia a cursos de agua) –0,4539 (0,0000)

Once established the variables that present a bigger coefficient of correlation of


Pearson (MDP and DCA) in connection with the place productivity (H_100) you
proceeds, by means of the algorithm Forward Selection, to determine the independent
variables and their more important transformations that contribute to the pattern of
place productivity. In this case the best results correspond MDP2 and DCA.
Estimate of the regression parameters. Certain the variables that have bigger
incidence in the place productivity, the regression parameters, coefficients of
determination and the standard error of the estimate were obtained (I square 2).
In the proposed pattern, as much the intercepción as the variables contributed to the
regression pattern (to see value - P in square 2). In consequence, the equation was
defined by the expression:
H_100 = 8,01713 - 0,00044MDP2 - 0,00632DCA [1]
It is necessary to consign that the value T of the intercepción is very high and,
therefore, the constant (8,01713) he/she explains great part of the present variation in
the variable H_100. On the other hand, the regression coefficients have a low
incidence inside the pattern, even when they contribute statistically.
Suppositions about the error. Next the detailed analysis of the errors, necessary step
for the execution of the suppositions is shown settled down in the method of square
minima.
? Normal distribution of the residuals: with the information surrendered by the test K-S
(P = 0,8871) can one assert that the residuals were distributed in normal form,
because the value-p it is bigger than?, that which is exposed graphically in the figure
3.
? He/she mediates of the residuals similar to zero: the results surrendered by the
hypothesis test for the stocking (P = 0,953) do they indicate that the null hypothesis is
not rejected for a? = 0,05, reason for which is possible to assert that the residuals
have stocking similar to zero.
? Homocedasticidad: considering the information given by the test of Bartlett, where
the value-P (0,245) it is superior at the significancia level to (0,05), it is of manifesto
that you cannot reject the null hypothesis; in consequence it is possible to affirm that
the residuals are homocedásticos.

ESTIMATE OF THE PRODUCTIVITY OF PLACE FOR NETS ARTIFICIAL NEURONALES

Determination of the entrance variables in the pattern. In the square 3 a summary is


presented with the contribution of each variable, considering an arbitrary structure
3:6:1 (three entrance nodes, six intermissions and one of exit). The combined analysis
of the RMSE and the correlation coefficient allows to observe that when extracting the
variable MDP and DCA an increase of the RMSE it is generated (0,0663 and 0,0597,
respectively), and a decrease of the correlation coefficient (0,2606 and 0,4053). That
indicated previously allows to establish that this variables are the best predictoras for
the pattern neuronal.

Cuadro 2. Estimation of the regression parameters.

Variable Dependiente: Altura de los 100 árboles superiores por hectárea (H_100)

Parámetro Estimación Error estándar T Valor-P

Intercepción 8,01713 0,09519 84,2228 0,0000


MP2 –0,00044 0,00006 –7,23018 0,0000
DCA –0,00632 0,00112 –5,65323 0,0000

R2: 41,65% Ra2: 40,80% Error estándar de la estimación: 0,563


Figura 3. Frequency histogram and normal probability plot.

Cuadro 3. Aporte de cada variable al modelo neuronal.


Contribution of each variable to the neural model.

Variable a extraer
Correlación (Valor-P) RMSE
del modelo
MDP 0,2606 (0,0040) 0,0663
DCA 0,4053 (0,0003) 0,0597
MDT 0,5205 (0,0001) 0,0589
MDO 0,6035 (0,0001) 0,0518

Definition of the number of hidden nodes. With the knowledge of the significant
variables the final architecture of the pattern neuronal was determined, observing in
simultaneous form the correlation coefficient and RMSE. The square 4 present a
summary of seven structures evaluated for the pattern neuronal. The quantity of
iterations varied depending on the selected structure, the stop approach is related with
the analysis of the graph of the RMSE in function of the time of the control curve and
of learning for each architecture of the pattern neuronal. Definition of the number of
hidden nodes.

Cuadro 4. Neural model quality indicators with different structures.

Cantidad de nodos ocultos Coeficiente de Correlación (Valor-P) RMSE

2 0,5362 (0,0002) 0,0527


3 0,5760 (0,0001) 0,0504
4 0,5923 (0,0000) 0,0539
5 0,5310 (0,0002) 0,0529
6 0,5468 (0,0002) 0,0521
7 0,5760 (0,0001) 0,0506
8 0,5405 (0,0002) 0,0523

According to the obtained information of the square 4, you concludes that the best
structure is 2:3:1 o'clock, if it is considered the RMSE like estimador of the quality. On
the other hand, if it is considered the correlation coefficient, the best structure it is
2:4:1 o'clock. For this reason, he/she decided to appeal to the graphic analysis to
choose the best structure. This analysis (it figures 4) it allows to appreciate a better
distribution of the data in the first case (it structures 2:3:1), since the real values have
better way on the straight line of the dear value, in comparison with the results given
by the pattern structure neuronal 2:4:1 located to the right.
The figure 5 sample the definitive architecture of the net selected neuronal. In the
entrance layer the presence of the digital pattern of slopes is observed (MDP) and the
distance to the courses of water (DCA). In the intermediate layer three hidden nodes
are symbolized and in that of exit he/she is the height of the 100 higher trees for
hectare (H_100). Finally, the presence of the BIAS is observed, corresponding to
numeric values that act as constant independent of the ponderación of the entrance
variables.
The algorithm of retropropagación of the error (Backpropagation) with three hidden
neurons and 6.810 cycles carried out in the software Pathfinder Neural Networks
SystemTM it was the structure selected for the construction of the final pattern.
Comparison of the models of place productivity. The square 5 present the results of the
analysis of the comparison of the models obtained for technical of multiple lineal
regression and nets artificial neuronales. It is appreciated that the pattern neuronal
presents a smaller error associated to the estimate when it is considered to the RMSE
and EMA like comparison parameters. On the other hand, if the half bias is
contemplated as indicator, the pattern of lineal regression presents a smaller associate
error in its estimates.

Figura 4. Actual values in relationship with calculate value.


Figura 5. Neural network structure.

Você também pode gostar