Você está na página 1de 18

IJIMS 5(2002)2,1-18, p.

1

Sensitivity Analysis Applied to Artificial Neural Networks: What has my
neural network actually learned?

Peter de B. Harrington and Chuanhao Wan
Clippinger Laboratories, Ohio University Center for Intelligent Chemical Instrumentation
Ohio University, Athens, Ohio 45701-2979
Fax: (740) 593-0148, Phone: (740) 593-2099
Email: harringp@ohio.edu

Abstract
An effective method for extracting characteristic features from neural network models has been
devised. Sensitivity analysis was used for selecting salient features of the input space. Because neural
network models are fundamentally nonlinear, the sensitivity measurement depends on the input values.
Two criteria are used for measuring the sensitivity of the neural network model. The first method
measures the changes of neural network output with respect to the perturbation magnitude of individual
input variables of the mean object for each class of training data. The second method measures the
average sensitivity of the class objects. The sensitivity analysis was evaluated with two complementary
neural networks with three sets of synthetic spectra and real sets of ion mobility spectra. The neural
network models were built using radial basis function neural networks (RBFNs) and temperature
constrained-cascade correlation neural networks (TC-CCNs). A method for implementing weight decay
with the conjugate gradient training had been devised that yield more sensitive neural network models
without temperature constraints. Temperature constraints have been demonstrated as furnishing more
sensitive network models. By comparing the sensitivities of the mean and the mean sensitivity, the
individual input variables may be assessed. If these two sensitivities for an input variable differ by a
constant factor, then that variable is modeled by a simple linear relationship. If the two sensitivities vary
by a non-constant scale factor, then the variable is modeled by higher order functions in the network.

Corresponding author
IJIMS 5(2002)2,1-18, p. 2

Introduction
As an important pattern recognition technique, artificial neural networks (ANNs) have found a
wide range of uses that include chemical structure classifications [1], pharmaceutical fingerprinting [2],
and quantitative structure-activity relationships (QSAR) [3]. Despite the increased applications, practical
use of ANNs is still limited by the difficulty of interpreting and assessing the network models that many
view as black boxes. Feature extraction methods can be applied to neural network models to cull
characteristic features from the data. Furthermore, these culled features may be used to compress the
data. Neural network models constructed from compressed data can be more robust. These characteristic
features may afford an avenue to assess the causal relationships among the data, which is an important
benefit of feature selection. This benefit is especially important for neural network models that may be
overfitting the data or modeling artifacts that are correlated with the class descriptors. When the training
set is underdetermined (i.e., more variables than objects) insignificant or variables will be correlated with
the properties. If these features that have no chemical or physical meaning can be removed from the
training set a more general neural network may be obtained. Other advantages yielded by selecting
pertinent features and re-training the network model is that training and prediction will have a smaller
computational cost, and potential for overfitting the training data may be reduced.
Feature extraction from ANN models can be accomplished with sensitivity analysis. Some
studies [4] have proposed methods on the quantitative measurement of sensitivities. Zurada et al. [5, 6]
proposed three comparable methods to calculate so-called mean square sensitivity, absolute value average
sensitivity and maximum sensitivity. Similarly, Howes et al. [7] studied the three types of input
influence, i.e., general influence (GI), specific influence (SI), and potential influence (PI), on the network
output. Most of these studies considered the effects of weight matrix in the multi-layer perceptron

1. Brown, C. W. Chemical Information Based on Neural Network Processing of Near-IR Spectra.
Anal. Chem. 1998, 70, 2983-2990.
2. Collantes, E. R.; Duta, R.; Welsh, W. J. Preprocessing of HPLC Trace Impurity Patterns by
Wavelet Packets for Pharmaceutical Fingerprinting Using Artificial Neural Networks. Anal.
Chem. 1997, 69, 1392-1397.
3. Sutter, J. M.; Dixon, S. L.; Jurs, P. C. Automated Descriptor Selection for Quantitative structure-
Activity Relationships Using Generalized Simulated Annealing. J. Chem. Inf. Comput. Sci.,
1995, 35, 77-84.
4 . Kowalski, B. R.; Faber, K. F. Comment on a Recent Sensitivity Analysis of Radial Base
Function and Multi-Layer Feed-Forward Neural Network Models. Chem. Intel. Lab. Syst., 1996,
34, 293-297.
5. Zurada, J. M.; Malinowski, A.; Cloete, I. Sensitivity analysis for minimization of input data
dimension for feedforward neural networks, IEEE Int. Symp. On Circuits and Systems , London,
May 30 June 3, 1994.
6. Eberhart, R.C.; Cloete, I.; Zurada, J. M. Determining the Significance of Input Parameters Using
Sensitivity Analysis, Lecture Notes Computer Science, 1995, 930, 382-388.
7. Howes, P.; Crook, N. Using Input Parameter Influences to Support the Decision of Feedforward
Neural Networks, Neurocomputing, 1999, 24, 191-206.
IJIMS 5(2002)2,1-18, p. 3

(MLP). Choi et al. [8] defined the sensitivity of input perturbations as the ratio of output error to the
standard deviation of each input perturbation, which involves complex weight calculations. The weight-
based methods such as Hinton diagrams [9] may not be practical for assessing the importance of a large
number of features or complex neural network models. Kovalishyn et al. [10] have proposed several
sensitivity measurements method to be used with cascade-correlation networks (CCN) for variable
selection. In their work, the sensitivity was measured by connection weights, or the second derivative of
the error function with respect to the neuron weight. It was shown in this paper, the sensitivities
measured from their definition were not stable with the dynamic growth of the network and sensitive to
addition of noise. In Ikonomopoulos study [11] of the importance of input variables with a wavelet
adaptive neural network, the sensitivity of input variables was estimated by the ratio of the standard
deviations of the prediction and the altered input. It was found that with their method, sensitivity
measurements were highly correlated with the input perturbation. Sung [12] derived the sensitivity
coefficient matrix for a backpropagation neural networks with two hidden layers. Other sensitivity
analysis methods [13, 14] that were based on the input magnitude and functional measurements were also
proposed.
Because neural network models are fundamentally nonlinear, the sensitivity will depend on the
input values from which they are calculated. The sensitivity measurement reported in this paper
calculates the sensitivities of the mean input and the mean sensitivity for each class. Weight vectors were
not directly involved in the sensitivity calculation. The sensitivity for each input is estimated by the
change of the output for each class with respect to the input feature perturbation. Thus, the sensitivity of
an input variable is the partial derivative of the network model output with respect to the input variable.
The sensitivity of the class mean is compared to the mean sensitivity of the class to furnish a measure of
the modeling of that variable by the network. This comparison also allows the assessment of whether the
variables are modeled by simple linear functions or more complex higher order functions in the neural
network model. This test can be used to justify the use of neural networks instead of simpler methods
such as linear discriminant analysis.

8. Choi, J. Y.; Choi, C. H. Sensitivity Analysis of Multilayer Perceptron with Differentiable
Activation Functions. IEEE Transactions on Neural Networks. 1992, 3, 101-107.
9. Masters, T. Practical Neural Network Recipes in C++. Academic Press, New York, 1993.
10. Kovalishyn, V. V.; Tetko, I. V.; Luik, A. I.; Kholodovych, V. V.; Villa, A. E. P.; Livingstone, D.
J. Neural Network Studies. 3. Variable Selection in the Cascade-Correlation Learning
Architecture. J. Chem. Inf. Comput. Sci. 1998, 38, 651-659.
11. Ikonomopoulos, A. Wavelet Decomposition and Radial Basis Function Networks for system
monitoring, IEEE Transactions on Neural Science, 1998, 45, 2293-2301.
12. Sung, A. H. Ranking Importance of Input Parameters of Neural Networks, Expert System with
Applications, 1998, 15, 405-411.
13. Gedeon, T. D. Data Mining of Inputs: Analyzing Magnitude and Functional Measures,
International Journal of Neural Systems. 1997, 8, 209-218.
14. Koda, M. Stochastic Sensitivity Analysis Method for Neural-Network Learning. International
Journal of Systems Science, 1995, 26, 703-711.
IJIMS 5(2002)2,1-18, p. 4

In this paper, the sensitivity for specific feature was estimated by perturbation of individual
feature rather than removing it completely. The sensitivity computation is used to evaluate a neural
network model after it has been trained, and can be accomplished with a single training data set. The
method was evaluated with synthetic data and experimental IMS data.
Theory of RBFN and TC-CCN
Two types of ANNs, RBFN [15] and TC-CCN[16], were used in the study and have been
reported in detail elsewhere. These two networks were chosen to evaluate the sensitivity method, because
they are complementary. RBF networks rely on distances in the data space from cluster centers for
constructing models. These distances are then modeled by a scale parameter. Alternatively, the TC-CCN
is a class of multilayer feedforward (MLF) network. These networks, project the inputs onto weight
vectors and correct the projection by measuring the distance from a bias value. Superficially, the RBF
networks apply a vector subtraction followed by a scalar multiplication, and the MLF networks apply a
vector multiplication followed by a scalar subtraction.
With RBFN, the number of hidden units, the radii and centroids of the hidden units are
configured with the K-means enhanced linear-averaging algorithm.[15] This stage of configuration
occurs before the actual training begins. After the network architecture is configured, the input features
were used with Gaussian transfer functions of hidden neurons. The output from RBFN is a linear
transformation of the hidden layer output. With TC-CCN, the network starts from a minimal
configuration, i.e., input units and output units. The hidden layers are configured by adaptively installing
hidden units one at a time until a satisfactory training error is obtained. Both types of networks construct
minimal size network models that achieve satisfactory training errors. The robustness of the sensitivity
procedure was evaluated with the two different network methods.
The networks were all trained to a 5% relative error in training. The relative error was a relative
root mean square error RRMSEC as defined as
2
1 1
2
1 1
( )
( )
N P
ij ij
i j
N P
ij j
i j
y y
RRMSEC
y y
= =
= =



(1)
for which N is the number of training objects, P is the number of network outputs, y
ij
is target output for
the i object and j output,
ij
y is the corresponding network estimate of the target value, and
j
y is the
average target value for the j output.
Cascade correlation neural networks
Compared to RBFNs, cascade correlation networks are advantageous because they train rapidly
and configure their own architecture (i.e., number of processing units and layers). These networks train

15. Wan, C.; Harrington, P. B. Self-Configuring Radial Basis Function Neural Networks for
Chemical Pattern Recognition in press J. Chem. Inf. Comput. Sci.
16. Harrington, P.B. Temperature Constrained Cascade Correlation Neural Networks, Anal. Chem.
1998, 70, 1297-1306.
IJIMS 5(2002)2,1-18, p. 5

rapidly because only a single unit is adjusted at a time. The hidden units are trained by maximizing the
covariance between the hidden unit outputs and the corresponding residual errors. These units train by
using a conjugate gradient optimization.
For non-constrained neural network processing units, weight decay is important because large
weights can cause input values for the nonlinear function (e.g., sigmoid) to be large in magnitude. For
virtually every nonlinear function, extreme input values result in outputs that furnish first derivatives of
zero. Because network training relies on propagation of the error backwards through the nonlinear
function, the derivative of this function must be nonzero so that the weight vector may be trained.
Because the sensitivity calculation also relies on derivatives, it is important that the weights remain as
small as possible without influencing the network outputs. In addition, if the weights are initialized
randomly and a given variable is uncorrelated with the properties, a zero weight value should be obtained.
However, the weight will never be adjusted because the variable is not correlated with the properties.
Decay introduces an implicit property that is correlated with all the variables, because all weight values
contribute to the weight vector length.
With the conjugate gradient method, weight decay is implemented by subtracting a small
proportion (10
-6
) of the weight vector length from the covariance value. For the output units a conjugate
gradient optimization is used to minimize the output error.
17
For calculation of the weight update vector,
the gradient is projected onto the inverted Jacobian matrix. Singular value decomposition is used to
invert the Jacobian matrix. This procedure helped improve training of the output units that are usually
overdetermined and well conditioned, but did not improve the training of the hidden units, which are
usually underdetermined. The inputs into the output layer are only the hidden unit outputs. Because the
hidden units are trained to maximize the covariance with respect to the residual error, the output values
among the hidden units have a large degree of independence. For training the output units that train by
error minimization, the same small proportion of the weight vector length is added to the total error, so
that weight decay may be implemented.
For temperature constrained networks, the weights are updated and then the weight vector is
normalized to unit length. Therefore, weight values that do not correlate with a property will be reduced
to zero as the weights that are correlated increase in magnitude. Furthermore, the temperature parameter
is optimized so that for each unit the first derivative is maximized with respect to magnitude. This
optimization has the added benefit that each processing unit is configured to have maximum sensitivity.
Sensitivity Analysis
The importance with respect to the neural network models of the input features can be assessed
with sensitivity analysis. The sensitivity procedure is applied to a neural network model that is
represented as a function F.
) F(
i i
x y =
(2)
The input vector for the neural network is x
i
, which may be an ion mobility spectrum. The
network model estimates the class designations as a binary encoded vector y
i
for which the correct class is
assigned a value of 1, if that compound is present and a value of 0 if the compound is absent. For a two-
class problem, a blank spectrum would be encoded with two zero values and a mixture of the two

17. C. Cai and P.B. Harrington, Wavelet Transform Preprocessing for Temperature Constrained
Cascade Correlation Neural Networks in press at J. Chem. Inf. Comput. Sci.
IJIMS 5(2002)2,1-18, p. 6

compounds with two ones. Each compound by itself would have a value of 1 and 0, with the position
indicating the class membership. The input object x
i
has v variables and the function generates an
estimate of output
i
y that has P components (i.e., one for each class or output unit). The first step in this
procedure calculates the class mean input vector for class j.
j
N
i
i
j
N
j

=
=
1
x
x
(3)
for which N
j
is the number of objects in class j. The class is designated as a target output of unity. In this
case, the training objects are used, but prediction data sets can be used as well to calculate the class mean
k
x . A perturbation function H(j,k) is defined that gives a vector of the same number of components as
the mean vector or the number of inputs for the neural network model. All components except for k are
zero and the k component is a fraction of the maximum intensity (e.g., 10%-0.1%) of
j
x . The sensitivity
sj,k for class j and input variable k is defined as
,
,
F( H(j,k)) F( H(j,k)) F( )
2h ( )
j k
j j k
i j
s
x

+
= =


j j j
x x x

(4)
for which hj is the perturbation or fraction of the maximum intensity of the mean vector from class j. The
perturbation method allows the partial derivative of the network model to be calculated. The sensitivity,
s
j,k
is the partial derivative of the network model with respect to each individual input variable, while
holding all the other input variables constant. The sensitivities are calculated for all the variables of the
model and are collectively referred to as a sensitivity spectrum s
j
. F is the neural network model that may
have been obtained from RBF or TCCCN algorithms. The operation of calculating a sensitivity for an
input object is denoted as a function S(x).
A more time consuming, but more robust measurement of sensitivity is the mean sensitivity of
each individual class object. The mean sensitivity is calculated as
1
( )
( )
j
N
i
j
S
S
N
=
=
i
j
x
X

(5)
for which the sensitivity is calculated for each input object of class j and these sensitivities are averaged
across all the input objects in a class Xj.

Experimental Section
Two sets of synthetic data were constructed that has a single feature that was specific for each of
three classes. These features were variables 49, 149, and 249 for classes A, B, and C, respectively. Each
of the three feature peaks had intensities of unity. A nonspecific feature of unity that was common to all
classes was variable 199. Normal random deviates with mean zero were added to all the objects in the
data set to simulate noise. The deviates were generated with a standard deviation of 0.1 to furnish
IJIMS 5(2002)2,1-18, p. 7

synthetic data objects with a signal-to-noise ratio (SNR) of 10. A similar data set with a SNR of 1 was
created using a standard deviation of unity. Each class comprised fifty objects of the one hundred and
fifty synthetic objects in the training data. The number of features was three hundred for each object.
A hand-held chemical ion monitor (CAM) was used to collect ion mobility spectra of diethyl
methylphosphonate (DEMP) and dicyclohexylamine (DCHA) vapors in positive ion mode. The CAM
was modified by removing the acetone cartridge, so that the charge transfer was accomplished by water.
This modification increases the sensitivity and decreases the selectivity of the instrument. Three hundred
spectra were collected for each compound as cap sniffs of vials containing the pure compound. The vapor
phase concentration of the compounds varied during the experiment. In addition, 50 blank spectra were
collected for which the CAM sampled laboratory air. Each spectrum had 900 data points that were
collected at 80 kHz data acquisition rate. The data were preprocessed by selecting the data points in the
range of 3.0-12.3 ms, which was the range of pertinent chemical information for this experiment. The
networks had two outputs and three classes that corresponded to DCHA, DEMP, and blank spectra. The
blank spectra were coded as zero values for each of the two class outputs and contained only the reactant
ions peak.
All computations were performed under the Windows NT 4.0 (SP4) operating system. The RBF
computations were performed on a workstation with 64Mb RAM and Intel Pentium Pro 200 MHz
processor. The TC-CCN computations were performed on an Intel Pentium II 450 MHz processor that
was equipped with 128 Mb of RAM. Computer programs were written in C++ and compiled with
Borland C++ 5.02.
Results and Discussion
Sensitivity analysis with synthetic data sets
Examples of synthetic spectra with a SNR of 10 for the three classes are given in Fig. 1A - 1C.
To calculate the sensitivity spectra, each variable was perturbed by 0.01 or 1% of the largest intensity in
the spectrum. The resultant sensitivity spectra from the RBFN for the three classes are given in Fig. 2.
By comparing the synthetic spectra and the corresponding sensitivity spectra in Fig. 2, the feature for each
class was extracted, which is the peak with a positive sensitivity value. The features for the other classes
always had negative sensitivities, because they are features that characterize counter-examples. The
common feature was not salient in the sensitivity spectra. Another interesting point was that the SNR was
improved in the sensitivity spectra. A wide range of perturbation magnitudes (10%-0.1%) can be used
without causing much difference in the sensitivity measurement, indicating that the sensitivity
measurement method is relatively robust.
If the perturbation is too large, the sensitivity spectra may appear clipped and contain a pattern
that resembles white noise. If the perturbation is too low, the sensitivity spectra may have no noise and
sometimes no signal. The sensitivity spectrum will be all zero values. If the network model was over-
trained, the model may not be sensitive and this method may not work.
The effectiveness of the feature extraction method is demonstrated with synthetic spectra of SNR
1. In this set of synthetic spectra, the feature peaks are difficult to discern, however the feature peaks are
obvious in the sensitivity spectra. See Fig. 3A and 3B.
The sensitivity spectra in Fig. 4 were obtained from TC-CCN models that were trained to a 5%
RRMSEC for the synthetic data. Fig. 4A gives the spectrum obtained from the model built with the SNR
of 10.0 data set, and Fig. 4B gives the spectrum obtained from the model built with the SNR of 1.0 data
set. Note that the absolute sensitivity scale has decreased for the TC-CCN network model, however the
IJIMS 5(2002)2,1-18, p. 8

resolution between the pertinent and random features in the data has improved for the TC-CCN network.
The TC-CCN networks had better ability to resist modeling noisy features than RBFNs.
Evaluation Sensitivity Measurement with Experimental IMS Data
The CAM was used to sample vapors from diethyl methylphosphonate (DEMP) and
dicyclohexylamine (DCHA). Three hundred spectra for each compound were collected. Examples of the
spectra are given in Fig. 5. These spectra were used for building the TC-CCN and RBF neural network
models that were evaluated by sensitivity analysis. The networks were trained so that the RRMSEC was
less than 5%. The sensitivity spectra obtained from the RBF network were nearly identical to those
obtained from the TC-CCN models, which are given in Fig 6A and 6B. Note that the reactant ion peak,
which is the dominant peak at low concentration, disappeared from the sensitivity spectrum for DCHA,
while its intensity is much smaller for DEMP. All the characteristic peaks for two compounds were
correctly identified and enhanced relative to the reactant ion peak. The negative peaks correspond to
counter-examples. Because there were no mixture spectra present, the neural network logic indicated that
if peaks occur in the positions for DCHA (e.g., 8.5 ms) then DEMP could not be the correct classification.
The sensitivities are negative, because as the intensities at the DCHA drift times increase, the output for
the DEMP class will decrease.
Assessment of Nonlinear Features
Neural networks are frequently used to model nonlinear data. Thus the assessment of nonlinear
behavior of the neural network models is important. Because the sensitivity will vary based on input and
the neural network models are intrinsically nonlinear we propose a novel method to evaluate the nonlinear
changes with respect to the input variables is proposed.
For linear models, the outputs should be an affine transformation of the inputs. The nonlinearity
for variable can be assessed by the difference between the sensitivity calculated at the class mean and the
mean of the sensitivities of the individual inputs that comprise a class. For linear systems, the identity
below will hold with a scale factor () of unity.
F(x) ) x F( =
(6)
The nonlinear transfer functions in the neural network models will result in an inequality if the
scale factor is unity. However, if the network model is characterizing linear trends in the data via intrinsic
nonlinear transfer functions, then will have a value other than unity. For trends that are of higher order,
the equality in (6) will not hold. This method was evaluated using a third set of synthetic data that was
constructed to include a linear and nonlinear variable for classifying the data into two classes. The set
had to be carefully constructed. If only one feature is sufficient for classifying the data then the network
may not model other features that have higher order relationships. This case especially holds for networks
that are constrained by decay or temperature.
For this data set, which is given in Fig. 7A, there are 200 objects with 150 variables. The first
100 objects belong to class A and the second 100 objects belong to class B. Random normal deviates
with a mean of 0 and a standard deviation of 0.1 were added to all the data objects to simulate noise. At
position 100, all the objects had a constant signal of unity for both classes. The linear feature was at
position 50. This feature is important to separate the first 50 objects in class B from class A. The
nonlinear feature is a position 1. The objects from 175-200 in class B can be resolved from class A, but
these objects are not linearly separable. This type of data structure is common for spectra obtained from
diverse chemical structures that are modeled for common chemical properties.
IJIMS 5(2002)2,1-18, p. 9

The features are plotted with respect to the mean sensitivity and sensitivity of the mean are given
in Fig. 7B. Note that the important features as indicated by the sensitivities will occur as extreme points
along the lines or as outlier points. In this figure, point 50 is very important and is separated from the
other points. It also is collinear with other points and thus indicates that the network models this variable
with a simple linear relation. Note that the other important feature was 1, and it also is outlier that is
nonlinear, which is expected. This variable is nonlinearly separable because the class A values (0 and 2)
surround the class B values (i.e., 1) in this synthetic data
The same treatment was applied to the IMS data set. The features for the two sets of sensitivities
are given in Fig. 8A. It was discovered that the neural network models of the IMS data did not contain
any nonlinear values and that linear classification models should work well. This figure also indicates
that the DCHA model was more sensitive than the DEMP model. In addition, the peaks or important
features are located at the ends of these lines. This diagnostic indicated that for this IMS data set, the use
of a neural network classification model was unnecessary and that a simpler linear model would be used
sufficient. Because only two classes exist, we can model this data effectively with principal components
analysis (PCA). Note that the classes are clearly distinguishable in the object score plot in Fig. 8B and
can be linearly separated.
Conclusion
A new feature extraction method is presented for feature selection from artificial neural network
models. This method furnishes sensitivity spectra for each class. This method was evaluated with
synthetic spectra and ion mobility spectra. Sensitivity analysis determines the features in the data that are
used by the neural network models. In the sensitivity spectrum, positive values indicate relevant features
for the specified class and negative features indicate counter-examples features that correspond to other
classes. Characteristic features may be salient in the sensitivity spectra that are not apparent in the raw
data objects. The model may be evaluated to ensure that artifacts in the data are not contributing to the
classification.
A novel method for evaluating the nonlinearity of the individual variables has been devised. This
method allows variables that are modeled by higher order relationships (i.e., greater than one) to be
detected. Furthermore, this method indicates whether a neural network was necessary to model the data
or whether other simpler methods would yield satisfactory results. It cannot be understated the
importance of abandoning the black-box notion of a neural network. The neural network model can now
be visualized and assessed so that causality of the model may be ascertained.
Acknowledgements
Graseby Ionics and the U.S. Army E.R.D.E.C. are thanked for the use and the support of the
Chemical Agent Monitor. Tricia Buxton, Aaron Urbas, and Guoxiang Chen are thanked for their helpful
comments. The members of the International Society of Ion mobility Spectrometry are thanked for their
continued intellectual assistance and friendship.
IJIMS 5(2002)2,1-18, p. 10

Figures
Fig. 1A. A synthetic spectrum with SNR 10.0 for class A, the characteristic feature for class is designated
with a circle and the non-characteristic feature is designated with a square.
Fig. 1B. A synthetic spectrum with SNR 10.0 for class B, the characteristic feature for class is designated
with a circle and the non-characteristic feature is designated with a square.

0 50 100 150 200 250 300
-0.5
0.0
0.5
1.0
Class A
Feature
0 50 100 150 200 250 300
-0.5
0.0
0.5
1.0
Feature
Class B
IJIMS 5(2002)2,1-18, p. 11

Fig. 1C. A synthetic spectrum with SNR 10.0 for class B, the characteristic feature for class is designated
with a circle and the non-characteristic feature is designated with a square.


Fig. 2A. The sensitivity spectrum obtained for class A from a trained RBF network using a perturbation
of 1%.

0 50 100 150 200 250 300
-0.5
0.0
0.5
1.0
Feature
Class C
0 50 100 150 200 250 300
-0.4
-0.2
0.0
0.2
0.4
0.6
Feature
IJIMS 5(2002)2,1-18, p. 12

Fig. 2B. The sensitivity spectrum obtained for class B from a trained RBF network using a perturbation
of 1%.

Fig. 2C. The sensitivity spectrum obtained for class C from a trained RBF network using a perturbation
of 1%.

0 50 100 150 200 250 300
-0.3
-0.1
0.1
0.3
0.5
Feature
S
e
n
s
i
t
i
v
i
t
y
0 50 100 150 200 250 300
-0.3
-0.1
0.1
0.3
0.5
Feature
IJIMS 5(2002)2,1-18, p. 13

Fig. 3A. A synthetic spectrum with SNR 1 for class A, the characteristic feature for class is designated
with a circle.

Fig. 3B. A sensitivity spectrum for class A obtained from the trained RBF network with the synthetic
data at a SNR of 1.0. The characteristic feature for class is designated with a circle.

0 50 100 150 200 250 300
-4
-2
0
2
4
Feature
Class A
I
n
t
e
n
s
i
t
y
0 50 100 150 200 250 300
-0.2
-0.1
0.0
0.1
0.2
Feature Feature
S
e
n
s
i
t
i
v
i
t
y
Class A
IJIMS 5(2002)2,1-18, p. 14

Fig. 4A. The sensitivity spectrum for class A, obtained from a TC-CCN network that was trained to a 5%
RRMSEC with the synthetic data with a SNR of 10.

Fig. 4B. The sensitivity spectrum for class A, obtained from a TC-CCN network that was trained to a 5%
RRMSEC with the synthetic data with a SNR of 1.
0 50 100 150 200 250 300
Feature
-2.0*10
-002
-1.0*10
-002
0.0
1.0*10
-002
2.0*10
-002
S
e
n
s
i
t
i
v
i
t
y

(

o
/

i
)
Class A
0 50 100 150 200 250 300
Feature
-4.0*10
-003
-2.0*10
-003
0.0
2.0*10
-003
4.0*10
-003
6.0*10
-003
S
e
n
s
i
t
i
v
i
t
y

(

o
/

i
)
Class A
IJIMS 5(2002)2,1-18, p. 15

Fig. 5A. A positive ion mobility spectrum for diethyl methyl phosphonate (DEMP) obtained from a
Graseby Chemical Agent Monitor.
Fig. 5B. A positive ion mobility spectrum for dicyclohexylamine (DCHA) obtained from a Graseby
Chemical Agent Monitor.

3 5 7 9 11 13
Drift Time (ms)
-100
100
300
500
700
900
1100
I
n
t
e
n
s
i
t
y

(
m
V
)
3 5 7 9 11 13
Drift Time (ms)
-100
100
300
500
700
900
1100
I
n
t
e
n
s
i
t
y

(
m
V
)
IJIMS 5(2002)2,1-18, p. 16

Fig. 6A. Sensitivity spectrum for DEMP obtained from a TC-CCN model that was trained to a RRMSEC
of 5%

Fig. 6B. Sensitivity spectrum for DCHA obtained from a TC-CCN model that was trained to a RRMSEC
of 5%

2 4 6 8 10 12
Drift Time (ms)
-5.0*10
-007
-3.0*10
-007
-1.0*10
-007
1.0*10
-007
3.0*10
-007
5.0*10
-007
S
e
n
s
i
t
i
v
i
t
y

(

o
/

i)
2 4 6 8 10 12
Drift Time (ms)
-6.0*10
-007
-2.0*10
-007
2.0*10
-007
6.0*10
-007
S
e
n
s
i
t
i
v
i
t
y

(

o
/

i
)
IJIMS 5(2002)2,1-18, p. 17

Fig. 7A. Nonlinear synthetic data set for which the two classes are nonlinearly separable. The first 100
objects comprise class and the second 100 objects comprise class B. Two features are
characteristic for the classes. For feature 1, half the objects in class A have a value of 2 and half
have a value of zero. For class B, half the objects have a value of 1.0. A characteristic feature at
50 will distinguish have the objects in class B from the class A objects. This feature is a linear
and pertinent feature.

Fig. 7B. This figure gives the features plotted as a function of their sensitivities for a given class. The
linear feature 50 is linear with the other random features. Feature 1 is not collinear with the other
points, is a nonlinear, but important feature.
30 80 130 180
Object Number
-0.5
0.0
0.5
1.0
1.5
2.0
I
n
t
e
n
s
i
t
y
Feature 1
Feature 50
Nonlinear Data
-0.002 0.000 0.002 0.004 0.006
Sensitivity of the Class B Mean
-0.005
0.000
0.005
0.010
0.015
M
e
a
n

S
e
n
s
i
t
i
v
i
t
y

o
f

C
l
a
s
s

B
1
2
3 4
5
6 7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 22
23
24
25
26
27
28
29
30
31
32
33 34
35
36
37
38
39
40
41
42
43
44
45
46
47 48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71 72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92 93 94
95
96
97
98 99
100
101
102
103
104
105
106
107
108
109 110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142 143
144
145
146
147
148
149
150
IJIMS 5(2002)2,1-18, p. 18


Fig. 8A. An assessment of nonlinear features for the TC-CCN model built with the DCHA and DEMP
data sets.
Fig. 8B. PCA observation score plot for the DEMP and DCHA spectra set. The blank spectra form the
tight cluster in the upper left corner of the figure, the DCHA spectra form the line segment at the
lower left and the DEMP spectra from the line segment on the right side of the plot.

-6.0*10
-007
-2.0*10
-007
2.0*10
-007
6.0*10
-007
Sensitivity of the Mean
-1.5*10
-006
-5.0*10
-007
5.0*10
-007
1.5*10
-006
M
e
a
n

S
e
n
s
i
t
i
v
i
t
y
DEMP
DCHA
-5000 -3000 -1000 1000 3000 5000
Principal Component Score 1
-5000
-3000
-1000
1000
3000
5000
P
r
i
n
c
i
p
a
l

C
o
m
p
o
n
e
n
t

S
c
o
r
e

2
BB BBB
BB B
B B
B
B
BB BBB B
B
BBB BB BBB BB BB
B
BBBBB BB
B
B BBB BB BBBBB BB
AAAAA
A
A
A AAA AA A
A A
A AA AAA AAAAA AA
A AA AAA
A
AA AA AA
AA A
A AA
A
AA
A AAA
AAA
A AA
AA A
AA
AA A AAAAA AAAA
A AA AAAAA
AA
A AAAA A AAAA
AAA
A
AAAA AAA
AAAAA
AA AA A AAA AAA A AA
A A
AAA AA AAA AAAA A
A
A A
AA A
A AAAA A
A AA AAAA
AAAA AA
AAAA A AA AA AAA AA A AA AA AA AA AA AA AAA
AA
A
AA AAAA A AA AAAA
AA A AAAAA
A AAAA AA AAA AAA A AAAA AA AA A AA A
AA AAAAAA A AA A
A
A AAA A
AA AAA A AA AAA AA AAA AAA AAA A
A
AA
A AA
A AA
AA
PPP PP PP PPP PP P PP P PPPPP PP PP P PP
PP PPPPP P P P PPP PP PP PP PP PP PP P
PP PPPP PP PP PPP PP PP
PPP PP PPPP PP PPP PPPP
P PP PP P PP
P
P P PP P PP P PPPP P PP PP P PP
P PPPP PP PP P PPP PPP P P
PPPP P
PP P
P
PPPP
P
PP PP P
PP PPP
PPP
PPP P PP
P PP PP PPP PPP
P PPPP PP PPP PP PP
PPPP PPP P PPP P
P PPP PP PP P PPP PP
P P
P
P P
PP PP PP PPP
P
P PP PP P PPPP PPPP
P
P
P PPP PPP P
P PPP P PPP PPP P PP PP P PP PPPP PP P P P
P
PP P
PPPP PPP PP
Training Data Diethyl Methyl Phosphonate and Dicyclohexyl Amine
97% of the Cumulative Variance
B Blank Spectrum
P DEMP
A DCHA

Você também pode gostar