Notes Multi Disrcri Analysis

Lecture Notes
Multiple Discriminant Analysis (MDA) with Example
&
Application of SPSS
Prepared by
Prof Prithvi Yadav
For the course - ADA
2004
Indian Institute of Management,
Rajendra Nagar, Indore
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 -1 -
Key Concepts
1. Introduction about Multiple Discriminant Analysis (MDA)

2. Example
3. One versus Multiple Discriminant Functions
4. Geometry of two Discriminant Functions
5. Statistics with MDA
o Discriminant coefficients, or weights
o Standardized coefficients, or weights
o Centroids
o Structured coefficients, or loadings
o Eigenvalues
o Canonical corerelations
o Wilks' lambdas
6. Assumptions about multiple discriminant functions
7. Number: g-1 or k whichever is less
8. Steps in SPSS Analysis
9. Are the three discriminant functions Significant ?
10. Functions may be collinear
11. Discriminant scores must be independent
12. Interpretation of multiple discriminant functions
13. Territorial map
14. Hit Ratio of the Discriminant Model.
15. Scatterplot of the discriminant scores across the discriminant functions
16. Some special types of Multiple Discriminant Analysis
a. MDA with Variable selection procedure
b. MDA with Canonical Functions
c. MDA with Categorical Predictor variables
d. MDA with nearest Neighbour Analysis
_______________________________________________________________________________________________________
1.0 Introduction
Discriminant Analysis involving more than 2 groups is known as Multiple Discriminant Analysis
(MDA). It is similar to multiple regression analysis (MRA) except that the dependent variable is
categorical rather than continuous. In regression analysis, we want to be able to predict the value
of a variable of interest based on a set of predictor variables. In MDA, we want to be able to
predict class membership of an individual observation based on a set of predictor variables. It is
also some times known as classification analysis. Suppose we have several populations from
which observations may come. Suppose, also, we have a new observation that is known to come
from one of these populations, but it is not known from which population. The basic objective of
MDA is to produce a rule or classification scheme that will enable a researcher to predict the
population from which an observation is most likely to have come.
2.0 Example
An example has been taken from telecom company in order to discriminate customer’s category
by particulars of customers, usage pattern, socio economic status etc. The data file has 500
observations and the file name is telcom500.sav. It can be down loaded from the institute’s
server.
Dependant variable
Customer category (custcat)

1 Basic service
2 E-service
3 Plus service
4 Total service
Independent variables
1. Age of customer in yrs (age)
2. Marital status (marital)

0 Unmarried
1 Married
3. Years at current Address (address)
4. Household income in thousands (income)
5. Level of education (ed)

1 Did not complete high school
2 High school
3 Some college
_______________________________________________________________________________________________________
4 College degree
5 Postgraduate degree
6. Years with current employer (employ)
7. Retired (retire)
0 No
1 Yes
8. Gender (gender)
0 Male
1 Female
9. Number of people in household (reside)
3.0 One Versus Multiple Discriminant Functions
• When the dependant variable has two groups …
One discriminant func tion can be extracted from the data
• When there are three groups …
Two functions can be extracted from the data
• When there are g-number of groups …
(g - 1) functions can be extracted from the data, or k-number of functions if the number
of predictor va riables (k) is less than the number of groups (g)
_______________________________________________________________________________________________________
4. Geometry of Two Discriminant Functions
Imagine a problem with two predictor variables and a DV with three groups. Now draw a
scatterplot of the cases in each group across the two predictor variables.
X2
x
x xx x
xx
xxx
oo
ooo
o o o
oo o
o
***
** ** *
* **
X1
Group 1 = ∗
Group 2 = o
Group 3 = x
• How best to describe these three groups?
X2
Z1
x xx x x
xxxx x
x x x
oo
ooo
o o o
oo o
o
*** Z2
** ** *
* **
X1
_______________________________________________________________________________________________________
Two vectors are fit to the data
Z1 reasonably good fit for groups 1 and 3, but a bad fit to group 2 (1st discriminant
function)
Z2 reasonably good fit for group 2, but a bad fit for groups 1 & 3 (2nd discriminant
function)
The two vectors taken together better explain the three groups than either one by itself.
5. Statistics with Multiple Discriminant Functions
The statistical output with multiple discriminant functions is comparable to that with one
function …
Except that multiple sets of statistics are derived for each discriminant function, including:
§ Discriminant coefficients, or weights

§ Standardized coefficients, or weights
§ Centroids
§ Structured coefficients, or loadings
§ Eigenvalues
§ Canonical correlations
§ Wilks' lambdas
6. Assumptions About Multiple Discriminant Functions
• Must the various discriminant functions be independent of each other, i.e. noncollinear?
No, they may be collinear or noncollinear, whatever best fits the data. Geometrically, the
functions can be other than 90° apart.
• Must the discriminant scores (Z) produced by the various discriminant functions be
independent of each other, i.e. noncollinear?
Yes, the correlations among the discriminant scores produced by the various functions
must all be equal to zero (0.0)
r Z1 Z2 = 0.0
_______________________________________________________________________________________________________
7. Steps in SPSS Analysis
1. To select cases
Data
Select cases
Use filter variable : filt_500
2. For Analysis
Analyze
Classify
Discriminant
Grouping variable → custcat →
define range → 1: 4
Independent → age,
marital,
address,
income,
ed,
employ,
retire,
gender,
reside
Use stepwise method
Continue
Statistics
Box’s M
Separate Group covariance
Functions coefficients →
Unstandardized
Continue
Display
Casewise results →
limit cases to first 20
Summary table
Plots → separate groups
Continue
Save
Predicted group membership
Discriminant score
Continue
_______________________________________________________________________________________________________
8. SPSS Output of Discriminant Analysis : Telecom category
8.1 Covariance Matrices
CUS AGE MAR ADD INC ED EMP RET GEN RES

1 AGE 157.981 -.841 64.370 178.873 -3.334 52.843 .860 -.795 -5.078
MARITAL -.841 .243 -.381 -3.751 -.054 -.482 -.014 .041 .538
ADDRESS 64.370 -.381 61.395 73.239 -1.145 11.471 .345 -.118 -2.470
INCOME 178.873 -3.751 73.239 2061.440 8.784 172.424 -1.209 .139 -9.011
ED -3.334 -.054 -1.145 8.784 1.351 -2.227 -.032 .036 -.079
EMPLOY 52.843 -.482 11.471 172.424 -2.227 49.466 .311 .002 -1.527
RETIRE .860 -.014 .345 -1.209 -.032 .311 .033 -.009 -.031
GENDER -.795 .041 -.118 .139 .036 .002 -.009 .252 .156
RESIDE -5.078 .538 -2.470 -9.011 -.079 -1.527 -.031 .156 2.231
2 AGE 129.459 -.419 81.103 307.161 -2.972 71.045 .861 .514 -3.569
MARITAL -.419 .251 -.367 .738 .068 .093 -.014 -.015 .497
ADDRESS 81.103 -.367 102.644 193.673 -2.046 44.855 .583 -.136 -3.008
INCOME 307.161 .738 193.673 6729.972 5.002 359.599 -1.191 1.365 -7.359
ED -2.972 .068 -2.046 5.002 1.448 -4.635 -.021 -.023 .072
EMPLOY 71.045 .093 44.855 359.599 -4.635 93.926 .598 .602 -.929
RETIRE .861 -.014 .583 -1.191 -.021 .598 .026 .012 -.036
GENDER .514 -.015 -.136 1.365 -.023 .602 .012 .250 .008
RESIDE -3.569 .497 -3.008 -7.359 .072 -.929 -.036 .008 2.007
3 AGE 171.080 -.021 104.41 739.222 -1.825 95.996 1.723 .650 -3.291
MARITAL -.021 .251 -.116 -9.364 .044 -.393 -.009 -.008 .405
ADDRESS 104.451 -.116 119.49 495.994 -.246 64.717 1.117 .610 -2.361
INCOME 739.222 -9.364 495.94 22850.15 18.633 1057.54 -3.048 1.582 -9.569
ED -1.825 .044 -.246 18.633 1.180 -1.738 -.038 .037 .169
EMPLOY 95.996 -.393 64.717 1057.524 -1.738 133.490 .588 .785 -.805
RETIRE 1.723 -.009 1.117 -3.048 -.038 .588 .078 .026 -.055
GENDER .650 -.008 .610 1.582 .037 .785 .026 .252 -.029
RESIDE -3.291 .405 -2.361 -9.569 .169 -.805 -.055 -.029 1.908
4 AGE 150.314 .971 93.028 615.109 -2.375 84.578 .377 .532 -4.748
MARITAL .971 .247 .547 .623 -.007 .675 .007 .024 .461
ADDRESS 93.028 .547 108.30 514.423 -.906 58.743 .296 .380 -2.637
INCOME 615.109 .623 514.42 16289.68 22.304 750.706 -.907 7.332 -28.81
ED -2.375 -.007 -.906 22.304 1.235 -2.103 -.028 -.060 .186
EMPLOY 84.578 .675 58.743 750.706 -2.103 98.730 .286 .308 -1.759
RETIRE .377 .007 .296 -.907 -.028 .286 .016 .000 -.003
GENDER .532 .024 .380 7.332 -.060 .308 .000 .252 -.020
RESIDE -4.748 .461 -2.637 -28.881 .186 -1.759 -.003 -.020 2.547
Questions : Are the variance covariance matrices of the four groups the same in the population?
_______________________________________________________________________________________________________
8.2 Box's M test for the homogeneity of variance/ covariance matrices
Analysis 1
Box's Test of Equality of Covariance Matrices
Log Determinants
Log
CUSTCAT Rank Determinant
1 9 12.564
2 9 14.495
3 9 17.535
4 9 15.297
Pooled within-groups 9 15.964
The ranks and natural logarithms of determinants printed are those of the group covariance matrices.
Test Results
Box's M 444.952
F Approx. 3.188
df1 135
df2 499807.91
7
Sig. .000
Tests null hypothesis of equal population covariance matrices.
Decision (Box's M = 444.952, p = 0.000)
The null hypothesis that the variance/covariance matrices are equal in the population is not
accepted.
_______________________________________________________________________________________________________
8.3 What Is the Final Model Estimated by the Discriminant Analysis?
Three discriminant functions were extracted …
Canonical Discriminant Function Coefficients
Function
1 2 3
AGE .009 -.020 .017
MARITAL .249 .506 -.560
ADDRESS .000 .052 -.048
INCOME -.002 .001 .008
ED .862 .093 -.150
EMPLOY .015 .077 -.042
RETIRE -1.075 .479 1.267
GENDER .056 -.153 -.562
RESIDE .089 .018 .509
(Constant) -3.072 -1.135 -.611
Unstandardized coefficients
1st Function
Z1 = -3.072 + .009 (age) + .249 (marital) + .000 (address) - .002(income) + .862(ed)

- .015 (employ) –1.075(retire) + .056(gender) + .089(reside)
2nd Function
Z2 = -1.135 - .020 (age) + .506 (marital) + .052 (address) + .001(income) + .093(ed)

+ .077 (emp loy) + .479 (retire) - .153(gender) + .018(reside)
3rd Function
Z3 = - .611 + .017 (age) - .560 (marital) - .048 (address) + .008(income) - .150(ed)

- .042 (employ) +1.267(retire) -.562(gender) + .509(reside)
for example, Given a 22-year-old female married highly educated customer staying since 2yrs
having income 19000, employed since last 4 yrs, not retired yet and 5members in the house …
Z1 = - 3.072 x 0.009 x 22 + 0.249 x 1 + 0 x 2 - 0.002 x 19 + 0.862 x 2 + 0.015 x 4 - 1.075 x 0

+ 0.056 x 1 + 0.089 x 5
= -.378
similarly;
Z2 = - .515
and
Z3 = .774
These three discriminant scores will be used to classify the telecom customers into one of the
four pre-classified groups..
_______________________________________________________________________________________________________
Notes on Multiple Discriminant Analysis , Indian Institute of Management, Indore. @June, 2004 - 10 -
8.4 Are the Two Discriminant Functions Significant?
Eigenvalues
Canonical
Function Eigenvalue % of Variance Cumulative % Correlation
1 .183(a) 67.3 67.3 .393
2 .072(a) 26.5 93.8 .259
3 .017(a) 6.2 100.0 .128
a First 3 canonical discriminant functions were used in the analysis.
Wilks' Lambda
Wilks'
Test of Function(s) Lambda Chi-square df Sig.
1 through 3 .775 122.048 27 .000
2 through 3 .917 41.385 16 .000
3 .984 7.966 7 .336
1st Function
• Eigenvalue = 0.183
• Of the variance explained by the two functions, the 1st explains 67.3%
• The canonical correlation (η) between the two predictor variables and the discriminant
scores produced by the 1st function = 0.393
• The chi-square test of the Wilks' Λ is significant (χ2 = 122.04, p < 0.000). The null
hypothesis that in the population the BSS = 0, η = 0, is rejected.
2nd Function
• Of the variance explained by the two functions, the 1st explains 26.5%
scores produced by the 1st function = 0.259
hypothesis that in the population the BSS = 0, η = 0, is rejected.
3rd Function
_______________________________________________________________________________________________________
• Of the variance explained by the two functions, the 3rd explains 6.2%
scores produced by the 2nd function = 0.128
hypothesis that in the population the BSS = 0, η = 0, is accepted.
Decision
Since the third function is not significant, its associated statistics will not be used in the
interpretation of the affect of independent variables on classifying customer category
status.
8.5 What Are the Standardized Canonical Discriminant Functions?

Standardized Canonical Discriminant Function Coefficients
Function
1 2 3
AGE .114 -.253 .208
MARITAL .124 .252 -.279
ADDRESS .002 .515 -.475
INCOME -.204 .086 .919
ED .982 .106 -.171
EMPLOY .145 .753 -.411
RETIRE -.212 .095 .250
GENDER .028 -.077 -.282
RESIDE .131 .026 .751
Zz = C1 ZX1 + C2 ZX2 + … + Ck ZXk
1st Function
Zz1 = .114 (age) + .124 (marital) + .002 (address) - .204(income) + .982(ed)

+.145 (employ) – .212(retire) + .028(gender) + .131(reside)
2nd Function
Z2 = - .253 (age) + .252 (marital) + .515 (address) + .086(income) + .106(ed)

+ .753 (employ) + .095 (retire) - .077(gender) + .026(reside)
3rd Function
_______________________________________________________________________________________________________
Z3 = .028 (age) - .279 (marital) - .475 (address) + .919(income) - .171(ed)
- .411 (employ) +.250(retire) -.282(gender) + .751(reside)
Conclusions:
Recall that the 3rd function was found not to be significant. Of the nine variables in the
1st function, education (.982) has the greatest impact followed by retirement status. The
second function, no. of years as employee (.753) has maximum impact followed by
address (.515).
8.6 What is the Correlation Between Each of the Predictor Variables and
the Discriminant Scores Produced By the Two Functions?
Structure coefficients, or loadings …
Structure Matrix
Function
1 2 3
ED .942(*) -.069 .035
MARITAL .257(*) .247 .126
EMPLOY -.185 .870(*) .017
ADDRESS -.090 .726(*) -.299
AGE -.157 .626(*) -.122
INCOME .024 .556(*) .522
RETIRE -.270 .277(*) -.033
RESIDE .266 .074 .520(*)
GENDER .026 .024 -.250(*)
Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions
Variables ordered by absolute size of correlation within function.
* Largest absolute correlation between each variable and any discriminant function
Interpretations :
Structure matrix is the largest absolute correlation with one of the canonical functions. Within
each variable, values are ordered by size. The predictors level of education (.942) and marital
status (.257) load highest on the 1st function, although marital status does not have high value of
correlation but SPSS has taken it sufficient large hence indicated by (*). The second function
seems to be very useful since 5 variables e.g. employ(.870), Address(.726), age (.626),
income(.556) and retire (.277) have very high correlation. The third function has only two
predictors reside(.520) and gender (-.250) important.
8.7 What is the Mean Discriminant Score for Each Pre-Disposition Group
on Each Discriminant Function?
_______________________________________________________________________________________________________
Recall that these mean discriminant scores are called centroids and that the third discriminant
function is not significant.
Functions at Group Centroids
Function
CUSTCAT 1 2 3
1 -.302 -.431 .001
2 .359 .064 -.207
3 -.506 .306 .022
4 .487 .034 .160
Unstandardized canonical discriminant functions evaluated at group means
Notice how numerically similar the centroids of the 1st function are for groups 1 & 3 and 2 & 4
i.e. first function does not differentiate well between customers availing basic services(gr -1) &
Plus services (gr -3) similarly e-services (gr-2) and total services (gr-4). In other words, it can
differentiate all four groups into 2 groups in much better manner.
Second function that is the most efficient one among all three, is discriminating Gr-1 , 2 & 3 very
efficiently but it does not produce good differentiation between second and fourth i.e. e-services
and total services.
The third function which was not significant, can be seen clearly tha t it does not differentiate
among categories 1, 3 & 4. This is why the 3rd function was not found to be significant.
_______________________________________________________________________________________________________
Function 2
8.8 Territorial Map
-3.0 -2.0 -1.0 .0 1.0 2.0 3.0
ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô
3.0 ô 34 ô
ó 34 ó
ó 34 ó
ó 34 ó
ó 34 ó
ó 34 ó
2.0 ô ô ô ô 34 ô ô ô
ó 34 ó
ó 34 ó
ó 34 ó
ó 324 ó
ó 3224 ó
1.0 ô ô ô ô 3224 ô ô ô
ó 32 24 ó
ó 32 24 ó
ó 32 24 ó
ó * 32 24 ó
ó 32 24 ó
.0 ô ô ô 3333332 *24 * ô ô ô
ó 333333333311111112 24 ó
ó 33333333331111111111 * 12 24 ó
ó33331111111111 1224 ó
ó1111 124 ó
ó 14 ó
-1.0 ô ô ô ô 14 ô ô ô
ó 14 ó
ó 14 ó
ó 14 ó
ó 14 ó
ó 14 ó
-2.0 ô ô ô ô 14ô ô ô
ó 14 ó
ó 14 ó
ó 14 ó
ó 14 ó
ó 14 ó
-3.0 ô Canonical Discriminant Function 1 14 ô
ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô
-3.0 -2.0 -1.0 .0 1.0 2.0 3.0
_______________________________________________________________________________________________________
The territorial map is a crude version of the canonical variable plot given below but, now,
numbered boundaries mark the regions into which each group is classified. For example, first
canonical function can not differentiate category 4 i.e. customers with total services as almost all
points lie close to value 1.0. Customers availing Plus services can be seen those who have first
canonical variables as negative and positive values on second function. First category i.e. basic
services will have negative values on both functions. Centroid of second & fourth group has very
close together near (*24*) mark.
8.9 Canonical Variable Plot
Canonical Discriminant Functions

5
2
Customer category
1 Group Centroids
Plus service
E-service
Total service Total service
0 Basic service
Function 2
Plus service
-1
E-service
-2 Basic service
-3 -2 -1 0 1 2 3
Function 1
_______________________________________________________________________________________________________
8.10 How Were the Individual Cases Classified?
Highest Group Second Highest Group Discriminant Scores
P(G=g | Mahal P(G=g Mahal
Case Actual Pred P(D>d | G=g) D=d) Dis Gr | D=d) Dis Fun1 Fun2 Fun3
p df
Orig 2 4 4 .065 3 .644 7.213 2 .223 9.333 2.153 .123 2.265
3 3 3 .193 3 .554 4.720 2 .220 6.573 -1.023 1.862 -1.404
4 1 1 .706 3 .468 1.397 3 .239 2.746 -.956 -1.111 -.710
6 3 3 .697 3 .355 1.435 1 .260 2.058 -.744 .298 -1.152
7 2 1(**) .894 3 .343 .611 3 .245 1.283 -.368 -.530 .774
10 2 3(**) .557 3 .555 2.075 2 .176 4.366 -1.115 1.526 -.442
11 1 2(**) .913 3 .324 .526 4 .301 .676 .673 -.570 -.366
13 1 1 .982 3 .296 .172 3 .267 .378 -.356 -.180 -.324
14 4 2(**) .093 3 .490 6.426 4 .329 7.221 1.585 1.964 -1.354
16 2 4(**) .408 3 .529 2.895 2 .320 3.901 2.074 -.112 .757
21 2 1(**) .728 3 .389 1.303 2 .263 2.088 -.077 -1.152 -.855
22 1 3(**) .697 3 .538 1.437 1 .174 3.695 -1.162 1.268 -.263
25 3 3 .622 3 .397 1.766 1 .315 2.225 -1.106 .022 1.173
27 4 1(**) .831 3 .423 .876 3 .245 1.972 -.780 -.891 -.658
30 2 2 .641 3 .402 1.680 4 .309 2.203 1.132 -.567 -1.035
31 2 3(**) .718 3 .388 1.345 1 .332 1.656 -1.256 -.065 -.781
32 4 2(**) .226 3 .436 4.353 4 .431 4.378 2.054 1.237 -.530
33 3 3 .648 3 .389 1.649 2 .262 2.442 -.164 1.535 -.129
34 2 4(**) .559 3 .466 2.067 2 .361 2.575 1.905 -.205 .131
35 4 2(**) .614 3 .318 1.803 4 .295 1.953 .427 1.402 -.118
(a) 2 4 4 .374 9 .632 9.718 2 .232 11.721
3 3 3 .527 9 .542 8.069 2 .227 9.811
4 1 1 .900 9 .460 4.168 3 .242 5.451
6 3 3 .913 9 .348 3.969 1 .263 4.530
7 2 1(**) .695 9 .349 6.439 3 .248 7.117
10 2 3(**) .588 9 .570 7.469 2 .161 9.993
11 1 2(**) .913 9 .328 3.977 4 .304 4.126
13 1 1 .974 9 .291 2.727 3 .269 2.885
14 4 2(**) .576 9 .503 7.592 4 .314 8.536
16 2 4(**) .822 9 .540 5.139 2 .307 6.268
21 2 1(**) .963 9 .394 3.034 2 .256 3.897
22 1 3(**) .775 9 .548 5.639 2 .169 7.996
25 3 3 .392 9 .380 9.504 1 .326 9.814
27 4 1(**) .985 9 .426 2.347 3 .246 3.446
30 2 2 .829 9 .392 5.066 4 .315 5.502
31 2 3(**) .727 9 .395 6.130 1 .337 6.447
32 4 2(**) .475 9 .451 8.599 4 .415 8.766
33 3 3 .810 9 .379 5.272 2 .266 5.981
34 2 4(**) .736 9 .477 6.037 2 .348 6.672
35 4 2(**) .696 9 .323 6.429 4 .284 6.689
_______________________________________________________________________________________________________
8.11 What Was the Hit Ratio of the Discriminant Model ?
Classification Results(b,c)
Predicted Group Membership
CUS 1 2 3 4 Total
Original Count 1 59 14 23 23 119
2 23 35 26 28 112
3 37 17 57 19 130
4 30 32 18 46 126
% 1 49.6 11.8 19.3 19.3 100.0
2 20.5 31.3 23.2 25.0 100.0
3 28.5 13.1 43.8 14.6 100.0
4 23.8 25.4 14.3 36.5 100.0
Cross- Count 1 51 14 30 24 119
validated( 2 24 26 27 35 112
a)
3 38 17 52 23 130
4 31 34 19 42 126
% 1 42.9 11.8 25.2 20.2 100.0
2 21.4 23.2 24.1 31.3 100.0
3 29.2 13.1 40.0 17.7 100.0
4 24.6 27.0 15.1 33.3 100.0
a Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the
functions derived from all cases other than that case.
b 40.5% of original grouped cases correctly classified.
c 35.1% of cross-validated grouped cases correctly classified.
Hit Ratio = (197 / 487) (100) = 40.45%
Errors = (290 / 287) (100) = 59.55%
_______________________________________________________________________________________________________
9. Would the same results be achieved using Logistic Regression Anlysis ?
We can obtain almost same result from logistic regression analysis.
Hit Ratios
Discriminant Analysis 40.45%
Logistical Regression 40.20%
Case Processing Summary
Marginal
N Percentage
Customer Basic service 119 24.4%
category E-service 112 23.0%
Plus service 130 26.7%
Total service 126 25.9%
Marital status Unmarried 246 50.5%
Married 241 49.5%
Retired No 467 95.9%
Yes 20 4.1%
Gender Male 237 48.7%
Female 250 51.3%
Valid 487 100.0%
Missing 0
Total 487
Subpopulation 487(a)
a The dependent variable has only one value observed in 487 (100.0%) subpopulations.
Model Fitting Information
-2 Log
Model Likelihood Chi-Square df Sig.
Intercept Only 1348.692
Final 1176.162 172.530 30 .000
Goodness-of-Fit
Chi-Square df Sig.
Pearson 1439.812 1428 .408
Deviance 1176.162 1428 1.000
_______________________________________________________________________________________________________
Pseudo R-Square
Cox and Snell .298

Nagelkerke .318
McFadden .128
Likelihood Ratio Tests
-2 Log
Likelihood of
Reduced
Effect Model Chi-Square df Sig.
Intercept 1176.162(a) .000 0 .
TENURE 1224.455 48.294 3 .000
AGE 1178.077 1.915 3 .590
ADDRESS 1178.616 2.454 3 .484
INCOME 1184.336 8.174 3 .043
ED 1249.348 73.186 3 .000
EMPLOY 1177.803 1.641 3 .650
RESIDE 1178.793 2.632 3 .452
MARITAL 1176.500 .338 3 .953
RETIRE 1178.360 2.198 3 .532
GENDER 1176.861 .699 3 .873
The chi-square statistic is the difference in -2 log-likelihoods between the final model and a reduced model. The
reduced model is formed by omitting an effect from the final model. The null hypothesis is that all parameters of that
effect are 0.
a This reduced model is equivalent to the final model because omitting the effect does not increase the degrees of
freedom.
Parameter Estimates
95% CI for
Exp(B)
Customer Std.
category(a) B Error Wald df Sig. Exp(B) LB UB
Basic service Intercept 3.356 1.509 4.947 1 .026
TENURE -.039 .009 17.851 1 .000 .961 .944 .979
AGE .011 .019 .339 1 .560 1.011 .974 1.050
ADDRESS .001 .022 .001 1 .976 1.001 .958 1.045
INCOME -.002 .003 .364 1 .546 .998 .993 1.004
ED -.724 .134 29.116 1 .000 .485 .372 .630
EMPLOY -.011 .026 .172 1 .678 .989 .940 1.041
RESIDE -.142 .126 1.261 1 .261 .868 .678 1.111
[MARITAL=0] .113 .385 .087 1 .768 1.120 .527 2.382
[MARITAL=1] 0(b) . . 0 . . . .
[RETIRE=.00] -.090 1.030 .008 1 .931 .914 .121 6.886
[RETIRE=1.0
0(b) . . 0 . . . .
0]
[GENDER=0] -.118 .278 .181 1 .671 .888 .515 1.533
[GENDER=1] 0(b) . . 0 . . . .
_______________________________________________________________________________________________________
E-service Intercept -.059 1.509 .002 1 .969
TENURE .023 .008 7.448 1 .006 1.023 1.006 1.040
AGE -.011 .020 .294 1 .588 .989 .952 1.028
ADDRESS .005 .020 .065 1 .798 1.005 .967 1.045
INCOME -.004 .002 4.047 1 .044 .996 .992 1.000
ED .016 .126 .016 1 .899 1.016 .794 1.301
EMPLOY .006 .024 .057 1 .811 1.006 .960 1.053
RESIDE -.166 .121 1.876 1 .171 .847 .668 1.074
[MARITAL=0] .007 .361 .000 1 .984 1.007 .496 2.044
[MARITAL=1] 0(b) . . 0 . . . .
[RETIRE=.00] .137 1.020 .018 1 .893 1.147 .155 8.476
[RETIRE=1.0
0(b) . . 0 . . . .
0]
[GENDER=0] -.173 .268 .418 1 .518 .841 .498 1.421
[GENDER=1] 0(b) . . 0 . . . .
Plus service Intercept 3.940 1.408 7.825 1 .005
TENURE -.016 .009 3.302 1 .069 .984 .967 1.001
AGE -.011 .019 .340 1 .560 .989 .952 1.027
ADDRESS .026 .020 1.789 1 .181 1.027 .988 1.067
INCOME .001 .001 .476 1 .490 1.001 .998 1.004
ED -.837 .133 39.371 1 .000 .433 .333 .562
EMPLOY .018 .022 .655 1 .419 1.018 .975 1.064
RESIDE -.156 .125 1.553 1 .213 .856 .670 1.093
[MARITAL=0] -.106 .370 .082 1 .775 .900 .436 1.858
[MARITAL=1] 0(b) . . 0 . . . .
[RETIRE=.00] -.828 .886 .873 1 .350 .437 .077 2.480
[RETIRE=1.0
0(b) . . 0 . . . .
0]
[GENDER=0] .016 .272 .003 1 .954 1.016 .596 1.732
[GENDER=1] 0(b) . . 0 . . . .
a The reference category is: Total service.
b This parameter is set to zero because it is redundant.
Classification
Predicted
Percent
Observed Basic service E-service Plus service Total service Correct
Basic service 58 7 31 23 48.7%
E-service 14 39 24 35 34.8%
Plus service 33 19 58 20 44.6%
Total service 29 35 21 41 32.5%
Overall Percentage 27.5% 20.5% 27.5% 24.4% 40.2%
_______________________________________________________________________________________________________
10. Some Special Models of Multiple Discriminant Analysis
10.1 MDA with Variable selection Procedures
When several variables are being considered for discriminating purpose, you might ask questions
such as (1) are all the variables really necessary for effective discrimination and, (2) which
variables are the best discriminators? Variable selection procedures have been proposed that can
provide some guidance to researchers wishing to select a subset of the measured variables to use
for discrimination purpose. Most existing variable selection procedures are some what similar to
variable selection procedures used for multiple regression problems e.g. (1) Forward selection
procedure, (2) a backward elimination procedure, and (3) a stepwise procedure that is
combination of 1 and 2.
SPSS allows the use of variable selection procedures. It is suggested that the backward
elimination procedure performs better provided the number of variables is at most 15. When the
number of variables exceeds 15, the stepwise procedure is recommended. Some times, forward
selection procedure may produce sets of discriminating variables for which not every variable is
significant.
Caveats
While selection procedures select variables, they usually do not evaluate how well the selected
variables actually discriminate. To see how well the actually discriminate, you will often have to
run a discriminate analysis program using the selected variables as discriminators. Contrary to
what might be expected, a subset of well-chosen variables will often do a better job of
discriminating between groups than you can do by using all possible variables.
10.2 MDA with Canonical Functions
The idea of canonical discriminant analysis (CDA) was first introduced by Fisher, and many
authors refer to the method as Fisher’s between-within method. CDA creates new variables by
taking special linear combinations of the original variables. The canonical variables are created
so that they contain all the useful information in a set of original variables. In some sense, they
are similar to principal components and factors. However, they are not computed in the same
way. In a few cases, a researcher may be able to interpret the canonical variables, which
increases their usefulness. One advantage the canonical functions have, regardless whether they
are interpretable, is that they often allow a researcher to visualize the actual distances between
the populations under investigation in a reduced dimensional space.
10.3 MDA with Categorical Predictor Variables
Basically, MDA was developed under the assumption that the data have multivariate normal
distributions. Each of the discriminant procedures discussed in the previous sections was
developed under an assumption that the data vectors have multivariate normal distributions.
Quite often these rules are applied to non-normal data. One advantage that researchers have with
discriminant rules is that they can see how well they work by using them. If the rules work well
in cases when the data are non-normal, then there is no reason to be too concerned with the fact
_______________________________________________________________________________________________________
that the data are non- normal. In fact, it has been used such rules on categorical variables by
introducing dummy variables and using these dummy variables in discriminant programs.
To illustrate, suppose a researcher wishes to use RACE as a variable for discrimination and
suppose that RACE takes on values BLK, HISP, WHT, and ASIAN. To use RACE (with four
categories) in a discriminant program, the researcher must define three dummy variables (one
less than the number of race categories). These dummy variables, denoted by DUM1, DUM2,
and DUM3, could be defined as follows :
DUM1 = 1 if RACE = ‘BLK’, otherwise DUM1 = 0,

DUM2 = 1 if RACE = ‘HISP’, otherwise DUM2 = 0, and
DUM3 = 1 if RACE = ‘WHT’, otherwise DUM3 = 0.
Then the variables DUM1-DUM3 would be included in the variable set when using a statistical
computing package. Note that a fourth dummy variable for the ASIAN category is not needed
because if the first three dummy variables all have values equal to 0, then RACE must be equal
to ASIAN. If we were to include a fourth dummy variable for the last race, the discriminant
programs would not work because the sample variance-covariance matrix would not be
invertible.
Because dummy variables only take on two values, they obviously are not distributed normally.
And, while there is no reason to expect that a discriminant rule based on normal assumptions
would work well for the se kinds of variables, there is no reason not to use such a rule if it works
well. In a manner similar to that described earlier, additional categorical variables could be
included in the discriminating set of variables as well.
When using dummy variables to replace categorical variables, you must be careful when using
stepwise procedures. Generally, you should not eliminate any single dummy variable
corresponding to a categorical variable unless you can remove the whole set of dummy variables
that correspond to that discrete variable.
Another technique for developing discriminant rules is based on logistic regression. This
technique does not require that the discriminating variables be multivariate normal. This
technique should be given serious consideration when it is known that some of the variables are
non-normal. In particular, logistic regression methods should be considered when one or more of
the discriminating variables is categorical. Logistic regression is similar to multiple regression,
the primary difference is that the dependent variable in logistic regression is usually binary (i.e. it
takes on only two possible values), whereas the dependent variable in multiple regression is
continuous.
10.4 MDA with Nearest Neighbor Analysis
Nearest ne ighbor discriminant analysis is a nonparametric discrimination procedure. It does not

depend on the data being distributed normally. It does depend on the Mahalanobis distances
between pairs of observation vectors.
_______________________________________________________________________________________________________
The basic idea of nearest neighbor discriminant analysis is as follows : For a new observation
that is to be classified, first find the observation in the calibration data set that is closest to the
new observation (i.e., its Mahalanobis distance is smallest). Then assign the new observation to
the group from which the observation’s nearest neighbor comes.
If there is a tie (i.e., if the distances between the new observation and two or more other
observations are identical), then the procedure looks for its next nearest neighbor unless the tied
observations are from the same group. If the tied observations are from the same group, the new
observation is assigned to that group. If the tied observations are not from the same group and
the next nearest neighbor matches one of these groups, then the new observation is assigned to
that group. If there is no match, then the procedure looks for the next nearest neighbor, etc.
A variation of this process is to look at the k nearest neighbors of a new observation, and assign
each new observation to the group to which a majority of its k nearest neighbors belongs. For
example, suppose k = 5, and suppose in the set of the five nearest neighbors of a new
observation, three are from group 1 and two are from one/ two other groups. Since the majority
of the five nearest neighbors are from group 1, the new observation would be assigned to group
1.
_______________________________________________________________________________________________________
8. Readings
1. Hair J F, Anderson R E and Tatham R L and Black W C (1998). Multivariate Data

Analysis, 5th Edition, New Jersey, USA Prentice Hall Internationa l. Chapter 4.
2. Albers-Miller, N. D (1999). Consumer Misbehaviour: Why People Buy Illicit Goods,
Journal of Consumer Marketing, Vol. 16 No. 3, 273- 287.
3. Beharrell, B. and Crockett, A. (1992). New Age Food! New Age Consumers! With or
Without Technology Fix Please, British Food Journal, Vol. 94 No. 7, .
4. Crask M R and Perreault J R (1971) Validation of Discriminant Analysis in Marketing
Research. Journal of Marketing Research, Vol. XIV, No. l, Feb.
5. Cunningham, I. C. M. and Cunningham, W. H. (1973) The Urban In-Home Shopper:
Socio- Economic and Attitudinal Characteristics,Journal of Retailing, 49, 42-50.
6. Gamesalingham, S. and Kumar, K. (2001). Detection of Financial Stress via Multivariate
Statistical Analysis, Managerial Finance, Vol. 27 No. 4, 45-55.
7. Johnson, DE. (1998). Applied Multivariate Methods for Data Analysis. Duxbury Press,
An international Thomson Publishing company, New York.
8. McEnally M R and Hawes J M (1984). The Market for Generic Brand Grocery Products,
Journal of Marketing, Winter, 75-83.
9. Montgo mery D B (1975). New Product Distribution: An Analysis of Supermarket Buyer
Decisions, Journal of Marketing Research, (August 1975), 255-264.
10. Morrison, D. G. (1969) On the Interpretation of Discriminant Analysis. Journal of
Marketing Research, Vol. l, May, l56-l63.
11. Perry M (1969) Discriminant Analysis of Relations between Consumers Attitudes,
Behaviour and Intentions. Journal of Advertising Research, Vol. 9, No. 2, 1969, 34-39.
12. Pessemier E A, Burger P C, and Tigert D J (1967) Can New Product Buyers be
Identified? Journal of Marketing Research, Vol. 4, November, pp. 349-354.
13. Robertson T S and Kennedy J N (1968) Prediction of Consumer Innovators: Application
of Multiple Discriminant Analysis. Journal of Marketing Research, Vol. 5, No. 1, Feb.,
pp. 64-69.
14. Sands S and Moore P (1981) Store Site Selection by Discriminant Analysis. Journal of
the Market Research Society, Vol. 23, No. l, Jan, pp. 40-5l.
15. Siu, W-S and Tsoi, M-Y (1998). Nutrition Label Usage of Chinese Consumers. British
Food Journal, Vol. 100, No. 1, 25-29.
16. SPSS Base 10.0 guide, (1999). SPSS Inc, 233, South Wacker Drive, 11th Floor, Chicago.
17. SPSS Application Guide 10.0. (1999). SPSS Inc, 233, South Wacker Drive, 11th Floor,
Chicago.
18. Steel P, Storey D and Wynarczyk, P (1985) The Prediction of Sma ll Company Failure
Using Financial Tomlinson M (1994) Do Distinct Class Preferences for Foods Exist?,
British Food Journal, Vol. 96 (7), 11-17.
_______________________________________________________________________________________________________
19. Waldron D G (1978) The Image of Craftsmanship. A Predictor Variable Influencing the
Purchase of European Automobiles by Americans. European Journal of Marketing, Vol.
l2, No 8, pp. 554-56l.
20. Williams, C. E. and Tse, E. C. Y. (1995). The Relationship Between Strategy and
Entrepreneurship: The US Restaurant Sector. International Journal of Contemporary
Hospitality Management, Vol. 7 No. 1, 22-26.
-------------------------------------------*****************------------------------------------------------
_______________________________________________________________________________________________________

Notes Multi Disrcri Analysis

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Notes Multi Disrcri Analysis

Enviado por

Direitos autorais:

Formatos disponíveis

Lecture Notes

Multiple Discriminant Analysis (MDA) with Example

For the course - ADA

1. Introduction about Multiple Discriminant Analysis (MDA)

Customer category (custcat)

1. Age of customer in yrs (age)

2. Marital status (marital)

3. Years at current Address (address)

4. Household income in thousands (income)

5. Level of education (ed)

3.0 One Versus Multiple Discriminant Functions

• When the dependant variable has two groups …

One discriminant func tion can be extracted from the data

• When there are three groups …

Two functions can be extracted from the data

• When there are g-number of groups …

• How best to describe these three groups?

5. Statistics with Multiple Discriminant Functions

§ Discriminant coefficients, or weights

6. Assumptions About Multiple Discriminant Functions

8.1 Covariance Matrices

CUS AGE MAR ADD INC ED EMP RET GEN RES

Decision (Box's M = 444.952, p = 0.000)

Three discriminant functions were extracted …

Canonical Discriminant Function Coefficients

Z1 = -3.072 + .009 (age) + .249 (marital) + .000 (address) - .002(income) + .862(ed)

Z2 = -1.135 - .020 (age) + .506 (marital) + .052 (address) + .001(income) + .093(ed)

Z3 = - .611 + .017 (age) - .560 (marital) - .048 (address) + .008(income) - .150(ed)

Z1 = - 3.072 x 0.009 x 22 + 0.249 x 1 + 0 x 2 - 0.002 x 19 + 0.862 x 2 + 0.015 x 4 - 1.075 x 0

a First 3 canonical discriminant functions were used in the analysis.

8.5 What Are the Standardized Canonical Discriminant Functions?

Zz = C1 ZX1 + C2 ZX2 + … + Ck ZXk

Zz1 = .114 (age) + .124 (marital) + .002 (address) - .204(income) + .982(ed)

Z2 = - .253 (age) + .252 (marital) + .515 (address) + .086(income) + .106(ed)

Structure coefficients, or loadings …

8.9 Canonical Variable Plot

Canonical Discriminant Functions

Predicted Group Membership

Hit Ratio = (197 / 487) (100) = 40.45%

Errors = (290 / 287) (100) = 59.55%

We can obtain almost same result from logistic regression analysis.

Discriminant Analysis 40.45%

Logistical Regression 40.20%

Case Processing Summary

Model Fitting Information

Cox and Snell .298

Likelihood Ratio Tests

10.1 MDA with Variable selection Procedures

10.2 MDA with Canonical Functions

10.3 MDA with Categorical Predictor Variables

DUM1 = 1 if RACE = ‘BLK’, otherwise DUM1 = 0,

10.4 MDA with Nearest Neighbor Analysis

Nearest ne ighbor discriminant analysis is a nonparametric discrimination procedure. It does not

1. Hair J F, Anderson R E and Tatham R L and Black W C (1998). Multivariate Data

Você também pode gostar