Você está na página 1de 16

DISCRIMINANT ANALYSIS Discriminant Analysis is a technique for analysing data when the dependent variable(DV) is categorical (classification) and

the predictor independent variables (IVs ) are of interval or ratio nature. IMPORTANT DV : Non-metric (Nominal or ordinal scaled) Classification/grouping variable IVs : Metric variables (Interval or ratio scaled variables)

Examples: DV is Choice of a brand of PC (A, B or C) and IVs are rating of attributes of PCS on a 7point scale. Classification of customers into buyers and nonbuyers based on their demographic profiles such as age, income,sex and some factors related to shopping habits. Families who go/dont go for vacation holidays to holiday resorts as criterion variable and income,house hold size, attitude towards travel, importance to family vacation etc. as predictor variables.

Discriminant Analysis PURPOSE: to understand segmentation/ classifications and to predict group membership

INPUT: dependent variable as an indicator of group membership and independent variables as classification criteria KEY OUTPUT: classification matrix

The objectives of this technique are Development of Discriminant function which is a linear combination of independent variables, that best discriminates between the categories of the dependent variable (groups) Examine whether significant differences exists among the groups , in terms of the predictor variables.

1.

2.

3.

Determine which predictor variables contributes to most of the inter-group differences . Classification of cases to one of the groups based on values of the predictor variables.

4.

5.

Evaluating accuracy of the classification.

The Linier discriminant analysis model known as the discriminant function is given by D ( or Y) = b0 + b1 X1 + b2X2 + ..+ bkXk Where D = Discriminant score bs = discriminant coefficient Xs = Independent variables ( k independent variables)

In discriminant analysis a score is assigned to individuals or objects .This forms a basis for classifyng the item in the most likely class.

The Linier discriminant function in standardised form is given by D ( or Y) = B1 X1 + B2X2 + ..+ BkXk

If the DV has two groups a single discriminant equation is needed for categorising.
If the DV has three groups two discriminant equations are needed for categorising. If the DV has n groups (n-I) discriminant equations will be required for categorisation

Examples of applications of DA in Business research 1. How do customers who exhibit store loyalty differ from those who do not, in terms of demographic characteristics? 2. Do market segments differ in their media consumption habits? 3. What are the distinguishing characteristics of consumers who prefer to shop on the net?

Important statistics associated with analysis Discriminat Scores (DS) Discriminant function coefficients: (Bs ) Canonical correlation : association between discriminant scores(DS) and the groups Centriod : mean value of DS for a particular group. Classification matrix ( Confusion matrix or prediction matrix): Matrix of correctly classified and misclassified cases.

1. 2. 3.

4.

5.

6.

Hit ratio : Proportion of correct classification. Eigen values : Ratio of between group and with in group sum of squares. Larger the eigen value better is the function. Eigen value > 1 indicates that 100% of the explained variance is accounted for.( square of the cannonical correlation explains the % variation in the dependent varaible explained by the model)

7.

8. Wilks lamda: indicates the significance of the model. A lower value indicates higher significance. (Wilks lamda is converted to a

Key Output of Discriminant Analysis: Classification Matrix Classification matrix True Group good Credit Bad Credit Assigned group Good Credit 40 10 Bad Credit 15 35

For the above matrix, the proportion of correct classification, i.e. hit rate is (40+35)/(40+35+10+15)=75/100=75%

Examine the Quality of Discriminant Analysis: Which Result is Better?


True Group Assigned Group

Good Credit Good Credit 30


Bad Credit 10

Bad Credit 10
35

True Group

Good Credit Assigned Group Bad Credit

Good Credit 40 10

Bad Credit 10 30

Application problem A firm has developed a new industrial process which is a distinct improvement over the existing one .The firm wants to know which industrial units would be interested in buying the process.Units which are early adopters and innovators would go in for the new process. Net profit of industrial units and their membership with trade associations and technical societies are identified as two important determinants. Data w.r.t. these two are available. Data File Discrim.Sav

Logistic vs discrminant

When you need too many assumption and chances of type 2 error and of null hypothesis
Disciminat: all assumption are met and you need to end the hypothesis

Regression vs logistic vs discriminant

Regression is continuous Logistic and discriminant:dependent variable