Escolar Documentos
Profissional Documentos
Cultura Documentos
Seedling cane yield (kg) = ndr 2L/1000 [1] p( Y = 1) = {exp[18.1 + (0.04 stalk no.)
+ (2.73 stalk height) + (5.71 stalk diam.)]}
where n = seedling stalk number, d = density at 1.0 g cm3, r =
/{1 + exp[18.1 + (0.04 stalk no.)
stalk radius (in cm), and L = stalk height (in cm).
+ (2.73 stalk height) + (5.71 stalk diam.)]},
[3]
Data Analysis Using Articial Only one probability (p) can be modeled: in this case, the prob-
Neural Networks ability to select. The probability to reject was, therefore, 1 p.
The training data consisted of 20% (30 seedlings grown at the To predict a response, a threshold probability must be specified.
LSU AgCenter) and 10% (28 seedlings grown at the USDA) of If the probability to select is modeled, the response would be to
the original data. The input variables were stalk number, stalk select when the probability value was equal to or greater than
height, and stalk diameter and the response was either to select the threshold value and to reject when the probability value
(1) or reject (0) a seedling as determined by two experienced was less than the threshold. In SAS ANN models, the default
sugarcane breeders at each location. The training data were threshold is 0.5. Larger threshold values produce more stringent
run in SAS enterprise miner (SAS Institute, 2007) to produce selection criteria and vice versa.
the coefficients of the multiple linear regressions. The data col- The ANN model analysis produces six fit statistics that
lected from 150 LSU AgCenter and 272 USDA seedlings con- represent parameters that can be used to compare the model
stituted the prediction data. In the prediction data, the response or equation for their ability to account for the variability in
values to either select (1) or reject (0) a seedling were coded the data. The average profit (prediction power) was estimated
as missing values and needed to be estimated by the model. as the correlation between the response variable (1 or 0) and
The model selection criterion used was average error and the probability (Agresti, 2007). A higher profit would mean the
network architecture was the generalized linear model. The probability value was highly predictive of the response variable.
training technique used was the Levenberg-Marquadt set at The misclassification rate was estimated as the proportion of
50 preliminary runs. total observations that were classified by the model into differ-
ent response categories from what was observed. Lower values
Coefcients of the Prediction Model, would indicate correct model classification and accurate train-
Probability Values, and Fit Statistics ing dataset. The average squared error (ASE) was calculated as:
The ANN models use the training dataset to calculate coef- ASE = SSE/N = (OR Pp)2/N, [4]
ficients of the prediction model, which represents the relative
weighting of each input variable. The probability of either in which SSE is the sum of the squared error, OR is the observed
selecting or rejecting a seedling is calculated by multiplying response, Pp is the prediction probability, and N is the number
the values of stalk number, stalk height, and stalk diameter by of observations in the training dataset. Smaller values would
their respective coefficient as shown in Eq. [2] (for the data from indicate better model fit. The fi nal prediction error (FPE) was
the LSU AgCenter seedlings) and Eq. [3] (for the data from the estimated as:
USDA seedlings). FPE = [SSE(N + P)]/[N(N P)], [5]
p( Y = 1) = {exp[50.2 + (1.38 stalk no.) in which P is the number of parameters including the intercept.
+ (6.16 stalk height) + (11.2 stalk diam.)]} The FPE is an adjustment to ASE using (N + P)/(N P). The
/{1 + exp[50.2 + (1.38 stalk no.) adjustment penalizes for overparameterization (model complex-
+ (6.16 stalk height) + (11.2 stalk diam.)]}, ity) or the inclusion of too many input variables. Overparam-
[2] eterization inflates FPE and increases prediction errors. It is
generally desirable to achieve the best model fit by specifying the
Figure 1. The logistic cumulative distribution functions for estimated seedling cane yield (in kg) (x axis) plotted against posterior probabilities
(y axis) for the Louisiana State University Agricultural Center (LSU AgCenter) (a) and USDA (b) populations.
between the means of the selected and rejected seedlings than the visual method for all the families. The seedlings
was calculated and expressed as a percentage of the rejected selected by the ANN model also produced more stalks than
seedlings (Table 5). This metric was used to describe and those selected by the visual method. The stalks were thicker
evaluate the discriminating ability of the ANN models than those selected by the visual method. The magnitude of
and the visual method. A large percentage of the difference the discrimination of the ANN model was greater than that
between the means of the selected and rejected seedlings of the visual method, where the ANN model selected more
was used as an indicator of greater discriminating ability. seedlings than the visual method, for example, families
The ANN models produced greater discrimination XL01-001, XL01-050, XL01-059, and XL01-460 (Table
between the selected and rejected seedlings than the visual 6). Where the number of seedlings selected was equal, for
method (Table 5, Fig. 2). The ANN models were twice (for example, family XL01-215, the discriminating ability of the
the LSU AgCenter population) and 1.5 times (for the USDA ANN model was very similar to that of the visual method.
population) more discriminating between the selected and
rejected seedlings than the visual method. The seedlings Selection Efciency of Articial Neural
selected by the ANN models produced more stalks than Network Models Versus Visual Selection
those selected by the visual method. These selected seed- Improving selection efficiency is a challenge shared by
lings also produced thicker stalks for both populations and sugarcane breeders. Selection efficiency is the ability to
longer stalks for the USDA population. discard a seedling that would eventually produce low
Further evaluation of the discriminating ability was cane yield and/or select a seedling that would produce
done for each of the five families from the LSU AgCenter high cane yield. The number of seedlings selected by
population (Table 6). The ANN model produced greater one method and rejected by the other and the number
discrimination between the selected and rejected seedlings that performed better or worse than the population mean
Figure 2. Comparison of mean cane yield (in kg) for the seedlings selected and rejected using visual and articial neural network models
for the Louisiana State University Agricultural Center (LSU AgCenter) (a) and USDA (b) populations.
are shown in Table 7, whereas the mean performance of included more lower-yielding seedlings than the ANN
these seedlings is shown in Table 8. Generally, the visual model, indicating lower selection efficiency of the visual
method rejected more higher-yielding seedlings and method (Table 7, Fig. 3). Seedlings selected by the ANN
Table 6. Difference between the means of the selected and rejected seedlings expressed as a percent of the rejected seed-
lings for the seedlings selected using the visual method (Visual) and the articial neural network (ANN) model for stalk num-
ber (Stalks), stalk height (Height), stalk diameter (Diameter) and cane yield (Cane) and the number of seedlings selected (No.
Selected) for the individual crosses derived from the Louisiana State University Agricultural Center (LSU AgCenter) population.
XL01-001 XL01-050 XL01-059 XL01-215 XL01-460
Trait Visual ANN Visual ANN Visual ANN Visual ANN Visual ANN
Stalks 89 104 72 76 50 88 71 77 45 60
Height (cm) 1 1 9 5 4 9 7 7 0 2
Diameter (cm) 3 6 7 16 2 6 7 6 14 14
Cane (kg) 104 126 115 166 59 144 119 126 73 100
No. Selected 16 21 6 16 10 14 18 18 7 27
Table 8. Means of the rejected and selected, and the difference of the means of selected (S) and rejected (R) expressed as a
percent of rejected [(S R)/R%] for stalk number, stalk height, stalk diameter and seedling cane yield for the Louisiana State
University Agricultural Center (LSU AgCenter) and USDA populations.
LSU AgCenter USDA
Trait Rejected Selected (S R)/R% Rejected Selected (S R)/R%
Stalk number 10.00 12.21 22 11.31 14.65 30
Stalks height (cm) 215 205 4 212 226 7
Stalk diameter (cm) 1.91 2.39 25 1.97 2.12 8
Cane yield (kg) 6.17 10.78 75 7.09 10.74 51
Rejected refers to seedlings selected by the visual method and rejected by the ANN model.
Selected refers to seedlings rejected by the visual method and selected by the ANN model.
population. From the LSU AgCenter population, 57 out yield components and Brix as is usually the case during rou-
of 150 seedlings (38%) were selected by the visual method tine seedling selection.
whereas from the USDA population, 46 out of 272 (17%)
were selected. From the LSU AgCenter population, the DISCUSSION
ANN model selected 96 out of the 150 seedlings (64%) and The ANN model was superior to visual selection in identi-
from the USDA population, 53 out of the 272 seedlings fying seedlings with high cane yield potential, as evidenced
(19%). To produce identical comparisons, the visual selec- by several comparisons between the two selection meth-
tion rates within each population were used as standard ods. For example, the proportion of high yielding seedlings
for the ANN models. The number of the ANN model- selected by the ANN model was greater than that selected
selected seedlings was adjusted to equal that of the visual by the visual method. This proportion increased when sim-
method after ranking the probability values. The means of ilar selection rates were used for both methods. Generally,
the highest 38% for the LSU AgCenter population and 17% seedlings selected by the ANN model produced more stalks
for the USDA population were used for the comparison that were thicker and longer than those selected by the visual
(Table 9). The seedlings selected by the ANN model pro- method. The visual method rejected a greater proportion of
duced 16% (in the LSU AgCenter population) and 8% (in seedlings that produced estimated cane yields higher than the
the USDA population) more cane yield than those selected population mean compared with the ANN model. A good
by the visual method. The seedlings selected by the ANN number of these seedlings rejected by the visual method
model produced 8% more stalks that were thicker than were selected by the ANN model. Conversely, the ANN
those selected by the visual method. The selection rates model rejected low-yielding seedlings that were selected by
achieved in this study (38% for the LSU AgCenter and 17% the visual method. Because only a limited numbers of seed-
for the USDA populations) are atypical of the 5% (at the lings can be advanced to the next stage, the low efficiency of
LSU AgCenter) and 10% (at the USDA) rates practiced in the visual method would greatly reduce the overall efficiency
the regular breeding programs. In this study seedlings were of a selection program. The ANN uses fast and automated
selected for cane yield components alone and not for cane computations and was superior to the visual method even for
Table 9. Means for stalk number, stalk height, stalk diameter and estimated seedling cane yield of seedlings selected by the
articial neural network (ANN) models and the visual method (Visual), and of seedlings selected by the ANN method expressed
as a percent of seedlings selected by the visual method (ANN % Visual) for the Louisiana State University Agricultural Center
(LSU AgCenter) (38% selection rate) and USDA (17% selection rate) populations.
LSU AgCenter USDA
Trait Visual ANN ANN % Visual Visual ANN ANN % Visual
Stalk number 15.58 16.77 108 11.89 12.87 108
Height (cm) 228 225 98 222 228 102
Diameter (cm) 2.12 2.24 106 2.13 2.19 103
Cane yield (kg) 12.62 14.65 116 9.37 10.45 108
a variable dataset with poor model fit, as was the case with Conversely, traits with low variability would be less associ-
the USDA population. A good aspect of the ANN model is ated with the estimated seedling-cane yield and have little
that as the breeder gains experience, he/she will be in a better influence in determining the probability value assigned to
position to recognize data with a poor model fit and adjust each seedling. Therefore, the ability of the ANN model
the probability threshold accordingly. to use the most genetically variable traits during seed-
The ANN model selected seedlings based on those traits ling selection leads to higher selection efficiency than was
that exhibited the largest variability within the population. the case with the visual method. In this study, the ANN