Você está na página 1de 20

Welcome to Powerpoint slides for

Chapter 11

Discriminant Analysis for Classification and Prediction


Marketing Research Text and Cases by Rajendra Nargundkar

Slide 1

Application Areas
1. The major application area for this technique is where we want to be able to distinguish between two or three sets of objects or people, based on the knowledge of some of their characteristics. 2. Examples include the selection process for a job, the admission process of an educational programme in a college, or dividing a group of people into potential buyers and non-buyers. 3. Discriminant analysis can be, and is in fact used, by credit rating agencies to rate individuals, to classify them into good lending risks or bad lending risks. The detailed example discussed later tells you how to do that. 4. To summarise, we can use linear discriminant analysis when we have to classify objects into two or more groups based on the knowledge of some variables (characteristics) related to them. Typically, these groups would be users-non-users, potentially successful salesman potentially unsuccessful salesman, high risk low risk consumer, or on similar lines.

Slide 2

Methods, Data etc.

1. Discriminant analysis is very similar to the multiple regression technique. The form of the equation in a two-variable discriminant analysis is: Y = a + k1 x1 + k2 x 2 2. This is called the discriminant function. Also, like in a regression analysis, y is the dependent variable and x1 and x2 are independent variables. k1 and k2 are the coefficients of the independent variables, and a is a constant. In practice, there may be any number of x variables. 3. Please note that Y in this case is a categorical variable (unlike in regression analysis, where it is continuous). x1 and x2 are however, continuous (metric) variables. k1 and k2 are determined by appropriate algorithms in the computer package used, but the underlying objective is that these two coefficients should maximise the separation or differences between the two groups of the y variable. 4. Y will have 2 possible values in a 2 group discriminant analysis, and 3 values in a 3 group discriminant analysis, and so on.

Slide 2 contd... 5. K1 and K2 are also called the unstandardised discriminant function coefficients 6. As mentioned above, y is a classification into 2 or more groups and therefore, a grouping variable, in the terminology of discriminant analysis. That is, groups are formed on the basis of existing data, and coded as 1 and 2 or similar to dummy variable coding. 7. The independent (x) variables are continuous scale variables, and used as predictors of the group to which the objects will belong. Therefore, to be able to use discriminant analysis, we need to have some data on y and the x variables from experience and / or past records.

Slide 3 Building a Model for Prediction/Classification Assuming we have data on both the y and x variables of interest, we estimate the coefficients of the model which is a linear equation of the form shown earlier, and use the coefficients to calculate the y value (discriminant score) for any new data points that we want to classify into one of the groups. A decision rule is formulated for this process to determine the cut off score, which is usually the midpoint of the mean discriminant scores of the two groups. Accuracy of Classification: Then, the classification of the existing data points is done using the equation, and the accuracy of the model is determined. This output is given by the classification matrix (also called the confusion matrix), which tells us what percentage of the existing data points is correctly classified by this model.

Slide 3 contd... This percentage is somewhat analogous to the R2 in regression analysis (percentage of variation in dependent variable explained by the model). Of course, the actual predictive accuracy of the discriminant model may be less than the figure obtained by applying it to the data points on which it was based. Stepwise / Fixed Model: Just as in regression, we have the option of entering one variable at a time (Stepwise) into the discriminant equation, or entering all variables which we plan to use. Depending on the correlations between the independent variables, and the objective of the study (exploratory or predictive / confirmatory), the choice is left to the student.

Slide 4 Relative Importance of Independent Variables

1. Suppose we have two independent variables, x1 and x2. How do we know which one is more important in discriminating between groups?
2. The coefficients of x1 and x2 are the ones which provide the answer, but not the raw (unstandardised) coefficients. To overcome the problem of different measurement units, we must obtain standardised discriminant coefficients. These are available from the computer output. 3. The higher the standardised discriminant coefficient of a variable, the higher its discriminating power.

Slide 5 A Priori Probability of Classification into Groups The discriminant analysis algorithm requires us to assign an a priori (before analysis) probability of a given case belonging to one of the groups. There are two ways of doing this. .We can assign an equal probability of assignment to all groups. Thus, in a 2 group discriminant analysis, we can assign 0.5 as the probability of a case being assigned to any group. .We can formulate any other rule for the assignment of probabilities. For example, the probabilities could proportional to the group size in the sample data. If two thirds of the sample is in one group, the a priori probability of a case being in that group would be 0.66 (two thirds).

Slide 6 We will turn now to a complete worked example which will clarify many of the concepts explained earlier. We will begin with the problem statement and input data. Problem Suppose State Bank of Bhubaneswar wants to start credit card division. They want to use discriminant analysis and set up a system to screen applicants and classify them as either low risk or high risk (risk of default on credit card bill payments), based on information collected from their applications for a credit card. Suppose SBB has managed to get from SBI, its sister bank, some data on SBIs credit card holders who turned out to be low risk (no default) and high risk (defaulting on payments) customers. These data on 18 customers are given in fig. 1.

Slide 7 Fig. 1 1 1 RISKL AG 1 35 1 33 1 29 2 22 2 26 1 28 2 30 2 23 1 32 2 24 2 26 1 38 1 40 2 32 1 36 2 31 2 28 1 33 3 4 INC YRSM 4000 8 4500 6 3600 5 3200 0 3000 1 3500 6 3100 7 2700 2 4800 6 1200 4 1500 3 2500 7 2000 5 1800 4 2400 3 1700 5 1400 3 1800 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Slide 8

We will perform a discriminant analysis and advise SBB on how to set up its system to screen potential good customers (low risk) from bad customers (high risk). In particular, we will build a discriminant function (model) and find out
.The percentage of customers that it is able to classify correctly. .Statistical significance of the discriminant function. .Which variables (age, income, or years of marriage) are relatively better in discriminating between low and high risk applicants. .How to classify a new credit card applicant into one of the two groups low risk or high risk, by building a decision rule and a cut off score.

Slide 9 Input Data are given in fig. 1.

Interpretation of Computer Output: We will now find answers to all the four questions we have raised earlier. Q1. How good is the Model? How many of the 18 data points does it classify correctly? To answer this question, we look at the computer output labelled fig. 3. This is a part of the discriminant analysis output from any computer package such as SPSS, SYSTAT, STATISTICA, SAS etc. (there could be minor variations in the exact numbers obtained, and major variations could occur if options chosen by the student are different. For example, if a priori probabilities chosen for the classification into the two groups are equal, as we have assumed while generating this output, then you will very likely see similar numbers in your output). Fig. 3 : Classification Matrix

STAT. Group G1 Total

Classification Percent G_1 100.0000 9 94.4444 10

Matrix G_2 0 8

Slide 10 This output (fig. 3) is called the classification matrix (also known as the confusion matrix), and it indicates that the discriminant function we have obtained is able to classify 94.44 percent of the 18 objects correctly. This figure is in the percent correct column of the classification matrix. More specifically, it also says that out of 10 cases predicted to be in group 1, 9 were observed to be in group 1 and 1 in Group 2, (from column G-1). Similarly, from the column G-2, we understand that our of 8 cases predicted to be in group 2, all 8 were found to be in group 2. Thus, on the whole, only 1 case out of 18 was misclassified by the discriminant model, thus giving us a classification (or prediction) accuracy level of (18-1)/18, or 94.444 percent. As mentioned earlier, this level of accuracy may not hold for all future classification of new cases. But it is still a pointer towards the model being a good one, assuming the input data were relevant and scientifically collected. There are ways of checking the validity of the model, but these will be discussed separately.

Slide 11

Statistical Significance
Q2. How significant, statistically speaking, is the discriminant function? This question is answered by looking at the Wilks Lambda and the probability value for the F test given in the computer output, as a part of fig. 3.(shown below)
Discriminant Function Analysis Results Number of variables in the model: 3 Wilks Lambda: .3188764 approx. F (3, = 9.968056 p < .00089

14)

The value of Wilks Lamba is 0.318. This value is between 0 and 1, and a low value (closer to 0) indicates better discriminating power of the model. Thus, 0.318 is an indicator of the model being good. The probability value of the F test indicates that the discrimination between the two groups is highly significant. This is because p<.00089, which indicates that the F test would be significant at a confidence level of upto (1 .00089) x 100 or (.99911) 100 or 99.91.

Slide 12 Q3. We have 3 independent (or predictor) variables Age, Income and No. of Years Married for. Which of these is a better predictor of a person being a low credit risk or high credit risk? To answer this question, we look at the standardised coefficients in the output. These are given in fig. 5 (shown below). Fig. 5. STAT. Standardized Variable Root 1 AGE _.923955 Eigenval 2.136012 This output shows that Age is the best predictor, with the coefficient of 0.92, followed by Income, with a coefficient of 0.77, Years of Marriage is the last, with a coefficient of 0.15, Please recall that the absolute value of the standardised coefficient of each variable indicates its relative importance.

Slide 13 Q4. How do we classify a new credit card applicant into either a high risk or low risk category, and make a decision on accepting or refusing him a credit card? This is the most important question to be answered. Please remember why we started out with the discriminant analysis in this problem. State Bank of Bhubaneswar wished to have a decision model for screening credit card applicants. The way to do this is to use the outputs in fig. 4 (Raw or unstandardised coefficients in the discriminant function) and fig. 6 (Means of canonical variables). Fig. 6, the means of canonical variables, gives us the new means for the transformed group centroids. Fig. 6. STAT. Means of Group Root 1 G_1:1 -1.37793 Canonical

Slide 13 contd...
Thus, the new mean for group 1 (low risk) is 1.37793, and the new mean for group 2 (high risk) is + 1.37792. This means that the midpoint of these two is 0. This is clear when we plot the two means on a straight line, and locate their midpoint, as shown below-

-1.37 0 Mean of Group1 (Low Risk)

+1.37 Mean of Group2 (High Risk)

Slide 14 This also gives us a decision rule for classifying any new case. If the discriminant score of an applicant falls to the right of the midpoint, we classify him as high risk, and if the discriminant score of an applicant falls to the left of the midpoint, we classify him as low risk. In this case, the midpoint is 0. Therefore, any positive (greater than 0) value of the discriminant score will lead to classification as high risk, and any negative (less than 0) value of the discriminant score will lead to classification as low risk. But how do we compute the discriminant scores of an applicant? We use the applicants Age, Income and Years of Marriage (from his application) and plug these into the unstandardised discriminant function. This gives us his discriminant score.

Slide 14 contd...

Fig. 4. STAT. Variable AGE Constan Eigenval Raw Coefficients Root 1 -.24560 10.00335 2.13601

From Fig. 4 (reproduced above), the unstandardised (or raw) discriminant function is Y = 10.0036 Age (.24560) Income (.00008) - Yrs. Married (.08465) Where y would give us the discriminant score of any person whose Age, Income and Yrs. Married were known.

Slide 15 Let us take an example of a credit card application to SBB who is aged 40, has an income of Rs. 25,000 per month and has been married for 15 years. Plugging these values into the discriminant function or model above, we find his discriminant score y to be 10.0036 40 (.24560) 25000 (.00008) -15 (.08465), which is = 10.0036 9.824 2 1.26975 = - 3.09015

According to our decision rule, any discriminant score to the left of the midpoint of 0 leads to a classification in the low risk group. Therefore, we should give this person a credit card, as he is a low risk customer. The same process is to be followed for any new applicant. If his discriminant score is to the right of the midpoint of 0, he should be denied a credit card, as he is a high risk customer.
We have completed answering the four questions raised by State Bank of Bhubaneswar.

Você também pode gostar