Escolar Documentos
Profissional Documentos
Cultura Documentos
&
Application of SPSS
Prepared by
Prof Prithvi Yadav
2004
Indian Institute of Management,
Rajendra Nagar, Indore
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 -1 -
Key Concepts
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 -2 -
1.0 Introduction
Discriminant Analysis involving more than 2 groups is known as Multiple Discriminant Analysis
(MDA). It is similar to multiple regression analysis (MRA) except that the dependent variable is
categorical rather than continuous. In regression analysis, we want to be able to predict the value
of a variable of interest based on a set of predictor variables. In MDA, we want to be able to
predict class membership of an individual observation based on a set of predictor variables. It is
also some times known as classification analysis. Suppose we have several populations from
which observations may come. Suppose, also, we have a new observation that is known to come
from one of these populations, but it is not known from which population. The basic objective of
MDA is to produce a rule or classification scheme that will enable a researcher to predict the
population from which an observation is most likely to have come.
2.0 Example
An example has been taken from telecom company in order to discriminate customer’s category
by particulars of customers, usage pattern, socio economic status etc. The data file has 500
observations and the file name is telcom500.sav. It can be down loaded from the institute’s
server.
Dependant variable
Independent variables
7. Retired (retire)
0 No
1 Yes
8. Gender (gender)
0 Male
1 Female
9. Number of people in household (reside)
(g - 1) functions can be extracted from the data, or k-number of functions if the number
of predictor va riables (k) is less than the number of groups (g)
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 -4 -
4. Geometry of Two Discriminant Functions
Imagine a problem with two predictor variables and a DV with three groups. Now draw a
scatterplot of the cases in each group across the two predictor variables.
X2
x
x xx x
xx
xxx
oo
ooo
o o o
oo o
o
***
** ** *
* **
X1
Group 1 = ∗
Group 2 = o
Group 3 = x
X2
Z1
x xx x x
xxxx x
x x x
oo
ooo
o o o
oo o
o
*** Z2
** ** *
* **
X1
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 -5 -
Two vectors are fit to the data
Z1 reasonably good fit for groups 1 and 3, but a bad fit to group 2 (1st discriminant
function)
Z2 reasonably good fit for group 2, but a bad fit for groups 1 & 3 (2nd discriminant
function)
The two vectors taken together better explain the three groups than either one by itself.
The statistical output with multiple discriminant functions is comparable to that with one
function …
Except that multiple sets of statistics are derived for each discriminant function, including:
• Must the various discriminant functions be independent of each other, i.e. noncollinear?
No, they may be collinear or noncollinear, whatever best fits the data. Geometrically, the
functions can be other than 90° apart.
• Must the discriminant scores (Z) produced by the various discriminant functions be
independent of each other, i.e. noncollinear?
Yes, the correlations among the discriminant scores produced by the various functions
must all be equal to zero (0.0)
r Z1 Z2 = 0.0
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 -6 -
7. Steps in SPSS Analysis
1. To select cases
Data
Select cases
Use filter variable : filt_500
2. For Analysis
Analyze
Classify
Discriminant
Grouping variable → custcat →
define range → 1: 4
Independent → age,
marital,
address,
income,
ed,
employ,
retire,
gender,
reside
Use stepwise method
Continue
Statistics
Box’s M
Separate Group covariance
Functions coefficients →
Unstandardized
Continue
Display
Casewise results →
limit cases to first 20
Summary table
Plots → separate groups
Continue
Save
Predicted group membership
Discriminant score
Continue
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 -7 -
8. SPSS Output of Discriminant Analysis : Telecom category
Questions : Are the variance covariance matrices of the four groups the same in the population?
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 -8 -
8.2 Box's M test for the homogeneity of variance/ covariance matrices
Analysis 1
Box's Test of Equality of Covariance Matrices
Log Determinants
Log
CUSTCAT Rank Determinant
1 9 12.564
2 9 14.495
3 9 17.535
4 9 15.297
Pooled within-groups 9 15.964
The ranks and natural logarithms of determinants printed are those of the group covariance matrices.
Test Results
Box's M 444.952
F Approx. 3.188
df1 135
df2 499807.91
7
Sig. .000
Tests null hypothesis of equal population covariance matrices.
The null hypothesis that the variance/covariance matrices are equal in the population is not
accepted.
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 -9 -
8.3 What Is the Final Model Estimated by the Discriminant Analysis?
Function
1 2 3
AGE .009 -.020 .017
MARITAL .249 .506 -.560
ADDRESS .000 .052 -.048
INCOME -.002 .001 .008
ED .862 .093 -.150
EMPLOY .015 .077 -.042
RETIRE -1.075 .479 1.267
GENDER .056 -.153 -.562
RESIDE .089 .018 .509
(Constant) -3.072 -1.135 -.611
Unstandardized coefficients
1st Function
3rd Function
for example, Given a 22-year-old female married highly educated customer staying since 2yrs
having income 19000, employed since last 4 yrs, not retired yet and 5members in the house …
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 10 -
8.4 Are the Two Discriminant Functions Significant?
Eigenvalues
Canonical
Function Eigenvalue % of Variance Cumulative % Correlation
1 .183(a) 67.3 67.3 .393
2 .072(a) 26.5 93.8 .259
3 .017(a) 6.2 100.0 .128
Wilks' Lambda
Wilks'
Test of Function(s) Lambda Chi-square df Sig.
1 through 3 .775 122.048 27 .000
2 through 3 .917 41.385 16 .000
3 .984 7.966 7 .336
1st Function
• Eigenvalue = 0.183
• Of the variance explained by the two functions, the 1st explains 67.3%
• The canonical correlation (η) between the two predictor variables and the discriminant
scores produced by the 1st function = 0.393
• The chi-square test of the Wilks' Λ is significant (χ2 = 122.04, p < 0.000). The null
hypothesis that in the population the BSS = 0, η = 0, is rejected.
2nd Function
• Eigenvalue = 0.072
• Of the variance explained by the two functions, the 1st explains 26.5%
• The canonical correlation (η) between the two predictor variables and the discriminant
scores produced by the 1st function = 0.259
• The chi-square test of the Wilks' Λ is significant (χ2 = 41.385, p < 0.000). The null
hypothesis that in the population the BSS = 0, η = 0, is rejected.
3rd Function
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 11 -
• Eigenvalue = 0.017
• Of the variance explained by the two functions, the 3rd explains 6.2%
• The canonical correlation (η) between the two predictor variables and the discriminant
scores produced by the 2nd function = 0.128
• The chi-square test of the Wilks' Λ is significant (χ2 = 7.966, p < 0.336). The null
hypothesis that in the population the BSS = 0, η = 0, is accepted.
Decision
Since the third function is not significant, its associated statistics will not be used in the
interpretation of the affect of independent variables on classifying customer category
status.
Function
1 2 3
AGE .114 -.253 .208
MARITAL .124 .252 -.279
ADDRESS .002 .515 -.475
INCOME -.204 .086 .919
ED .982 .106 -.171
EMPLOY .145 .753 -.411
RETIRE -.212 .095 .250
GENDER .028 -.077 -.282
RESIDE .131 .026 .751
1st Function
3rd Function
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 12 -
Z3 = .028 (age) - .279 (marital) - .475 (address) + .919(income) - .171(ed)
- .411 (employ) +.250(retire) -.282(gender) + .751(reside)
Conclusions:
Recall that the 3rd function was found not to be significant. Of the nine variables in the
1st function, education (.982) has the greatest impact followed by retirement status. The
second function, no. of years as employee (.753) has maximum impact followed by
address (.515).
8.6 What is the Correlation Between Each of the Predictor Variables and
the Discriminant Scores Produced By the Two Functions?
Structure Matrix
Function
1 2 3
ED .942(*) -.069 .035
MARITAL .257(*) .247 .126
EMPLOY -.185 .870(*) .017
ADDRESS -.090 .726(*) -.299
AGE -.157 .626(*) -.122
INCOME .024 .556(*) .522
RETIRE -.270 .277(*) -.033
RESIDE .266 .074 .520(*)
GENDER .026 .024 -.250(*)
Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions
Variables ordered by absolute size of correlation within function.
* Largest absolute correlation between each variable and any discriminant function
Interpretations :
Structure matrix is the largest absolute correlation with one of the canonical functions. Within
each variable, values are ordered by size. The predictors level of education (.942) and marital
status (.257) load highest on the 1st function, although marital status does not have high value of
correlation but SPSS has taken it sufficient large hence indicated by (*). The second function
seems to be very useful since 5 variables e.g. employ(.870), Address(.726), age (.626),
income(.556) and retire (.277) have very high correlation. The third function has only two
predictors reside(.520) and gender (-.250) important.
8.7 What is the Mean Discriminant Score for Each Pre-Disposition Group
on Each Discriminant Function?
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 13 -
Recall that these mean discriminant scores are called centroids and that the third discriminant
function is not significant.
Functions at Group Centroids
Function
CUSTCAT 1 2 3
1 -.302 -.431 .001
2 .359 .064 -.207
3 -.506 .306 .022
4 .487 .034 .160
Unstandardized canonical discriminant functions evaluated at group means
Notice how numerically similar the centroids of the 1st function are for groups 1 & 3 and 2 & 4
i.e. first function does not differentiate well between customers availing basic services(gr -1) &
Plus services (gr -3) similarly e-services (gr-2) and total services (gr-4). In other words, it can
differentiate all four groups into 2 groups in much better manner.
Second function that is the most efficient one among all three, is discriminating Gr-1 , 2 & 3 very
efficiently but it does not produce good differentiation between second and fourth i.e. e-services
and total services.
The third function which was not significant, can be seen clearly tha t it does not differentiate
among categories 1, 3 & 4. This is why the 3rd function was not found to be significant.
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 14 -
Function 2
8.8 Territorial Map
-3.0 -2.0 -1.0 .0 1.0 2.0 3.0
ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô
3.0 ô 34 ô
ó 34 ó
ó 34 ó
ó 34 ó
ó 34 ó
ó 34 ó
2.0 ô ô ô ô 34 ô ô ô
ó 34 ó
ó 34 ó
ó 34 ó
ó 324 ó
ó 3224 ó
1.0 ô ô ô ô 3224 ô ô ô
ó 32 24 ó
ó 32 24 ó
ó 32 24 ó
ó * 32 24 ó
ó 32 24 ó
.0 ô ô ô 3333332 *24 * ô ô ô
ó 333333333311111112 24 ó
ó 33333333331111111111 * 12 24 ó
ó33331111111111 1224 ó
ó1111 124 ó
ó 14 ó
-1.0 ô ô ô ô 14 ô ô ô
ó 14 ó
ó 14 ó
ó 14 ó
ó 14 ó
ó 14 ó
-2.0 ô ô ô ô 14ô ô ô
ó 14 ó
ó 14 ó
ó 14 ó
ó 14 ó
ó 14 ó
-3.0 ô Canonical Discriminant Function 1 14 ô
ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô
-3.0 -2.0 -1.0 .0 1.0 2.0 3.0
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 15 -
The territorial map is a crude version of the canonical variable plot given below but, now,
numbered boundaries mark the regions into which each group is classified. For example, first
canonical function can not differentiate category 4 i.e. customers with total services as almost all
points lie close to value 1.0. Customers availing Plus services can be seen those who have first
canonical variables as negative and positive values on second function. First category i.e. basic
services will have negative values on both functions. Centroid of second & fourth group has very
close together near (*24*) mark.
2
Customer category
1 Group Centroids
Plus service
E-service
Total service Total service
0 Basic service
Function 2
Plus service
-1
E-service
-2 Basic service
-3 -2 -1 0 1 2 3
Function 1
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 16 -
8.10 How Were the Individual Cases Classified?
Highest Group Second Highest Group Discriminant Scores
P(G=g | Mahal P(G=g Mahal
Case Actual Pred P(D>d | G=g) D=d) Dis Gr | D=d) Dis Fun1 Fun2 Fun3
p df
Orig 2 4 4 .065 3 .644 7.213 2 .223 9.333 2.153 .123 2.265
3 3 3 .193 3 .554 4.720 2 .220 6.573 -1.023 1.862 -1.404
4 1 1 .706 3 .468 1.397 3 .239 2.746 -.956 -1.111 -.710
6 3 3 .697 3 .355 1.435 1 .260 2.058 -.744 .298 -1.152
7 2 1(**) .894 3 .343 .611 3 .245 1.283 -.368 -.530 .774
10 2 3(**) .557 3 .555 2.075 2 .176 4.366 -1.115 1.526 -.442
11 1 2(**) .913 3 .324 .526 4 .301 .676 .673 -.570 -.366
13 1 1 .982 3 .296 .172 3 .267 .378 -.356 -.180 -.324
14 4 2(**) .093 3 .490 6.426 4 .329 7.221 1.585 1.964 -1.354
16 2 4(**) .408 3 .529 2.895 2 .320 3.901 2.074 -.112 .757
21 2 1(**) .728 3 .389 1.303 2 .263 2.088 -.077 -1.152 -.855
22 1 3(**) .697 3 .538 1.437 1 .174 3.695 -1.162 1.268 -.263
25 3 3 .622 3 .397 1.766 1 .315 2.225 -1.106 .022 1.173
27 4 1(**) .831 3 .423 .876 3 .245 1.972 -.780 -.891 -.658
30 2 2 .641 3 .402 1.680 4 .309 2.203 1.132 -.567 -1.035
31 2 3(**) .718 3 .388 1.345 1 .332 1.656 -1.256 -.065 -.781
32 4 2(**) .226 3 .436 4.353 4 .431 4.378 2.054 1.237 -.530
33 3 3 .648 3 .389 1.649 2 .262 2.442 -.164 1.535 -.129
34 2 4(**) .559 3 .466 2.067 2 .361 2.575 1.905 -.205 .131
35 4 2(**) .614 3 .318 1.803 4 .295 1.953 .427 1.402 -.118
(a) 2 4 4 .374 9 .632 9.718 2 .232 11.721
3 3 3 .527 9 .542 8.069 2 .227 9.811
4 1 1 .900 9 .460 4.168 3 .242 5.451
6 3 3 .913 9 .348 3.969 1 .263 4.530
7 2 1(**) .695 9 .349 6.439 3 .248 7.117
10 2 3(**) .588 9 .570 7.469 2 .161 9.993
11 1 2(**) .913 9 .328 3.977 4 .304 4.126
13 1 1 .974 9 .291 2.727 3 .269 2.885
14 4 2(**) .576 9 .503 7.592 4 .314 8.536
16 2 4(**) .822 9 .540 5.139 2 .307 6.268
21 2 1(**) .963 9 .394 3.034 2 .256 3.897
22 1 3(**) .775 9 .548 5.639 2 .169 7.996
25 3 3 .392 9 .380 9.504 1 .326 9.814
27 4 1(**) .985 9 .426 2.347 3 .246 3.446
30 2 2 .829 9 .392 5.066 4 .315 5.502
31 2 3(**) .727 9 .395 6.130 1 .337 6.447
32 4 2(**) .475 9 .451 8.599 4 .415 8.766
33 3 3 .810 9 .379 5.272 2 .266 5.981
34 2 4(**) .736 9 .477 6.037 2 .348 6.672
35 4 2(**) .696 9 .323 6.429 4 .284 6.689
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 17 -
8.11 What Was the Hit Ratio of the Discriminant Model ?
Classification Results(b,c)
CUS 1 2 3 4 Total
Original Count 1 59 14 23 23 119
2 23 35 26 28 112
3 37 17 57 19 130
4 30 32 18 46 126
% 1 49.6 11.8 19.3 19.3 100.0
2 20.5 31.3 23.2 25.0 100.0
3 28.5 13.1 43.8 14.6 100.0
4 23.8 25.4 14.3 36.5 100.0
Cross- Count 1 51 14 30 24 119
validated( 2 24 26 27 35 112
a)
3 38 17 52 23 130
4 31 34 19 42 126
% 1 42.9 11.8 25.2 20.2 100.0
2 21.4 23.2 24.1 31.3 100.0
3 29.2 13.1 40.0 17.7 100.0
4 24.6 27.0 15.1 33.3 100.0
a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the
functions derived from all cases other than that case.
b 40.5% of original grouped cases correctly classified.
c 35.1% of cross-validated grouped cases correctly classified.
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 18 -
9. Would the same results be achieved using Logistic Regression Anlysis ?
Hit Ratios
Marginal
N Percentage
Customer Basic service 119 24.4%
category E-service 112 23.0%
Plus service 130 26.7%
Total service 126 25.9%
Marital status Unmarried 246 50.5%
Married 241 49.5%
Retired No 467 95.9%
Yes 20 4.1%
Gender Male 237 48.7%
Female 250 51.3%
Valid 487 100.0%
Missing 0
Total 487
Subpopulation 487(a)
a The dependent variable has only one value observed in 487 (100.0%) subpopulations.
-2 Log
Model Likelihood Chi-Square df Sig.
Intercept Only 1348.692
Final 1176.162 172.530 30 .000
Goodness-of-Fit
Chi-Square df Sig.
Pearson 1439.812 1428 .408
Deviance 1176.162 1428 1.000
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 19 -
Pseudo R-Square
-2 Log
Likelihood of
Reduced
Effect Model Chi-Square df Sig.
Intercept 1176.162(a) .000 0 .
TENURE 1224.455 48.294 3 .000
AGE 1178.077 1.915 3 .590
ADDRESS 1178.616 2.454 3 .484
INCOME 1184.336 8.174 3 .043
ED 1249.348 73.186 3 .000
EMPLOY 1177.803 1.641 3 .650
RESIDE 1178.793 2.632 3 .452
MARITAL 1176.500 .338 3 .953
RETIRE 1178.360 2.198 3 .532
GENDER 1176.861 .699 3 .873
The chi-square statistic is the difference in -2 log-likelihoods between the final model and a reduced model. The
reduced model is formed by omitting an effect from the final model. The null hypothesis is that all parameters of that
effect are 0.
a This reduced model is equivalent to the final model because omitting the effect does not increase the degrees of
freedom.
Parameter Estimates
95% CI for
Exp(B)
Customer Std.
category(a) B Error Wald df Sig. Exp(B) LB UB
Basic service Intercept 3.356 1.509 4.947 1 .026
TENURE -.039 .009 17.851 1 .000 .961 .944 .979
AGE .011 .019 .339 1 .560 1.011 .974 1.050
ADDRESS .001 .022 .001 1 .976 1.001 .958 1.045
INCOME -.002 .003 .364 1 .546 .998 .993 1.004
ED -.724 .134 29.116 1 .000 .485 .372 .630
EMPLOY -.011 .026 .172 1 .678 .989 .940 1.041
RESIDE -.142 .126 1.261 1 .261 .868 .678 1.111
[MARITAL=0] .113 .385 .087 1 .768 1.120 .527 2.382
[MARITAL=1] 0(b) . . 0 . . . .
[RETIRE=.00] -.090 1.030 .008 1 .931 .914 .121 6.886
[RETIRE=1.0
0(b) . . 0 . . . .
0]
[GENDER=0] -.118 .278 .181 1 .671 .888 .515 1.533
[GENDER=1] 0(b) . . 0 . . . .
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 20 -
E-service Intercept -.059 1.509 .002 1 .969
TENURE .023 .008 7.448 1 .006 1.023 1.006 1.040
AGE -.011 .020 .294 1 .588 .989 .952 1.028
ADDRESS .005 .020 .065 1 .798 1.005 .967 1.045
INCOME -.004 .002 4.047 1 .044 .996 .992 1.000
ED .016 .126 .016 1 .899 1.016 .794 1.301
EMPLOY .006 .024 .057 1 .811 1.006 .960 1.053
RESIDE -.166 .121 1.876 1 .171 .847 .668 1.074
[MARITAL=0] .007 .361 .000 1 .984 1.007 .496 2.044
[MARITAL=1] 0(b) . . 0 . . . .
[RETIRE=.00] .137 1.020 .018 1 .893 1.147 .155 8.476
[RETIRE=1.0
0(b) . . 0 . . . .
0]
[GENDER=0] -.173 .268 .418 1 .518 .841 .498 1.421
[GENDER=1] 0(b) . . 0 . . . .
Plus service Intercept 3.940 1.408 7.825 1 .005
TENURE -.016 .009 3.302 1 .069 .984 .967 1.001
AGE -.011 .019 .340 1 .560 .989 .952 1.027
ADDRESS .026 .020 1.789 1 .181 1.027 .988 1.067
INCOME .001 .001 .476 1 .490 1.001 .998 1.004
ED -.837 .133 39.371 1 .000 .433 .333 .562
EMPLOY .018 .022 .655 1 .419 1.018 .975 1.064
RESIDE -.156 .125 1.553 1 .213 .856 .670 1.093
[MARITAL=0] -.106 .370 .082 1 .775 .900 .436 1.858
[MARITAL=1] 0(b) . . 0 . . . .
[RETIRE=.00] -.828 .886 .873 1 .350 .437 .077 2.480
[RETIRE=1.0
0(b) . . 0 . . . .
0]
[GENDER=0] .016 .272 .003 1 .954 1.016 .596 1.732
[GENDER=1] 0(b) . . 0 . . . .
a The reference category is: Total service.
b This parameter is set to zero because it is redundant.
Classification
Predicted
Percent
Observed Basic service E-service Plus service Total service Correct
Basic service 58 7 31 23 48.7%
E-service 14 39 24 35 34.8%
Plus service 33 19 58 20 44.6%
Total service 29 35 21 41 32.5%
Overall Percentage 27.5% 20.5% 27.5% 24.4% 40.2%
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 21 -
10. Some Special Models of Multiple Discriminant Analysis
When several variables are being considered for discriminating purpose, you might ask questions
such as (1) are all the variables really necessary for effective discrimination and, (2) which
variables are the best discriminators? Variable selection procedures have been proposed that can
provide some guidance to researchers wishing to select a subset of the measured variables to use
for discrimination purpose. Most existing variable selection procedures are some what similar to
variable selection procedures used for multiple regression problems e.g. (1) Forward selection
procedure, (2) a backward elimination procedure, and (3) a stepwise procedure that is
combination of 1 and 2.
SPSS allows the use of variable selection procedures. It is suggested that the backward
elimination procedure performs better provided the number of variables is at most 15. When the
number of variables exceeds 15, the stepwise procedure is recommended. Some times, forward
selection procedure may produce sets of discriminating variables for which not every variable is
significant.
Caveats
While selection procedures select variables, they usually do not evaluate how well the selected
variables actually discriminate. To see how well the actually discriminate, you will often have to
run a discriminate analysis program using the selected variables as discriminators. Contrary to
what might be expected, a subset of well-chosen variables will often do a better job of
discriminating between groups than you can do by using all possible variables.
The idea of canonical discriminant analysis (CDA) was first introduced by Fisher, and many
authors refer to the method as Fisher’s between-within method. CDA creates new variables by
taking special linear combinations of the original variables. The canonical variables are created
so that they contain all the useful information in a set of original variables. In some sense, they
are similar to principal components and factors. However, they are not computed in the same
way. In a few cases, a researcher may be able to interpret the canonical variables, which
increases their usefulness. One advantage the canonical functions have, regardless whether they
are interpretable, is that they often allow a researcher to visualize the actual distances between
the populations under investigation in a reduced dimensional space.
Basically, MDA was developed under the assumption that the data have multivariate normal
distributions. Each of the discriminant procedures discussed in the previous sections was
developed under an assumption that the data vectors have multivariate normal distributions.
Quite often these rules are applied to non-normal data. One advantage that researchers have with
discriminant rules is that they can see how well they work by using them. If the rules work well
in cases when the data are non-normal, then there is no reason to be too concerned with the fact
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 22 -
that the data are non- normal. In fact, it has been used such rules on categorical variables by
introducing dummy variables and using these dummy variables in discriminant programs.
To illustrate, suppose a researcher wishes to use RACE as a variable for discrimination and
suppose that RACE takes on values BLK, HISP, WHT, and ASIAN. To use RACE (with four
categories) in a discriminant program, the researcher must define three dummy variables (one
less than the number of race categories). These dummy variables, denoted by DUM1, DUM2,
and DUM3, could be defined as follows :
Then the variables DUM1-DUM3 would be included in the variable set when using a statistical
computing package. Note that a fourth dummy variable for the ASIAN category is not needed
because if the first three dummy variables all have values equal to 0, then RACE must be equal
to ASIAN. If we were to include a fourth dummy variable for the last race, the discriminant
programs would not work because the sample variance-covariance matrix would not be
invertible.
Because dummy variables only take on two values, they obviously are not distributed normally.
And, while there is no reason to expect that a discriminant rule based on normal assumptions
would work well for the se kinds of variables, there is no reason not to use such a rule if it works
well. In a manner similar to that described earlier, additional categorical variables could be
included in the discriminating set of variables as well.
When using dummy variables to replace categorical variables, you must be careful when using
stepwise procedures. Generally, you should not eliminate any single dummy variable
corresponding to a categorical variable unless you can remove the whole set of dummy variables
that correspond to that discrete variable.
Another technique for developing discriminant rules is based on logistic regression. This
technique does not require that the discriminating variables be multivariate normal. This
technique should be given serious consideration when it is known that some of the variables are
non-normal. In particular, logistic regression methods should be considered when one or more of
the discriminating variables is categorical. Logistic regression is similar to multiple regression,
the primary difference is that the dependent variable in logistic regression is usually binary (i.e. it
takes on only two possible values), whereas the dependent variable in multiple regression is
continuous.
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 23 -
The basic idea of nearest neighbor discriminant analysis is as follows : For a new observation
that is to be classified, first find the observation in the calibration data set that is closest to the
new observation (i.e., its Mahalanobis distance is smallest). Then assign the new observation to
the group from which the observation’s nearest neighbor comes.
If there is a tie (i.e., if the distances between the new observation and two or more other
observations are identical), then the procedure looks for its next nearest neighbor unless the tied
observations are from the same group. If the tied observations are from the same group, the new
observation is assigned to that group. If the tied observations are not from the same group and
the next nearest neighbor matches one of these groups, then the new observation is assigned to
that group. If there is no match, then the procedure looks for the next nearest neighbor, etc.
A variation of this process is to look at the k nearest neighbors of a new observation, and assign
each new observation to the group to which a majority of its k nearest neighbors belongs. For
example, suppose k = 5, and suppose in the set of the five nearest neighbors of a new
observation, three are from group 1 and two are from one/ two other groups. Since the majority
of the five nearest neighbors are from group 1, the new observation would be assigned to group
1.
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 24 -
8. Readings
-------------------------------------------*****************------------------------------------------------
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 26 -