0 Votos favoráveis0 Votos desfavoráveis

157 visualizações76 páginasJan 15, 2011

© Attribution Non-Commercial (BY-NC)

PPTX, PDF, TXT ou leia online no Scribd

Attribution Non-Commercial (BY-NC)

157 visualizações

Attribution Non-Commercial (BY-NC)

Você está na página 1de 76

QTIA Presentation

Presented to,

Mr. Imtiaz Arif,

Presented by

Muhammad Usman Dilshad

usmandilshad@gmail.com

Karachi Pakistan.

2

Topics

Factor Analysis Cluster Analysis

Multiple Regression

Conjoint Analysis

Logistic Regression

Correspondence Analys

Discriminant Analysis

Structural Equation Modeli

ANOVA ANCOVA

3

Factor Analysis

Presented by:

Muhammad Usman

Dilshad

Reg Id # 3806

Menu

Factor

Analysis

Factor analysis is an interdependence

technique whose primary purpose is to

define the underlying structure among the

variables in the analysis.

Factor analysis Identifies a smaller number

of factors among large number of variables .

Have high correlation between

variables(factors)

5

Exploratory vs Confirmatory Factor

Analysis

Explores and summarizes underlying

correlation structure for data set

q

qConfirmatory factor analysis

Tests the correlation structure of data

set against a hypothesized structure and rates

the “goodness of fit”.

6

Assumptions

Multicollinearity:

KMO can be used to identify which variables to drop

from the factor analysis because they lack

multicollinearity.

and their sum is the KMO overall statistic.

factor analysis.

7

Points to Remember

Testing Assumptions of Factor Analysis

assumption that a structure does exist before the factor

analysis is performed.

• A statistically significant Bartlett’s test of sphericity (sig. > .05)

indicates that sufficient correlations exist among the

variables to proceed.

• Measure of Sampling Adequacy (MSA) values must exceed .50

for both the overall test and each individual variable.

Variables with values less than .50 should be omitted from

the factor analysis one at a time, with the smallest one being

omitted each time.

8

Other uses of Factor Analysis

q

qCalculating Factor Scores

qCreating Summated Scales

qSelecting Substitute Variables

q

9

We will now go to SPSS for

analysis .

Retrieve hsbdataB.sav

• Next, select the variables itemOl through item!4.

Syntax

FACTOR

/VARIABLES itemOl item02 item03 item04 itemOS item06 itemO? itemOS item09 itemlO itemll item!2 item!3 item!4

/MISSING

LISTWISE /ANALYSIS itemOl item02 item03 item04 itemOS item06 itemO? itemOS item09 itemlO itemll item!2

item!3 item!4

/PRINT INITIAL CORRELATION DET KMO ROTATION

/FORMAT SORT BLANK(.3)

/CRITERIA FACTORS(3) ITERATE(25)

/EXTRACTION PAF

/CRITERIA ITERATE(25)

/ROTATION VARIMAX

/METHOD=CORRELATION.

10

MULTIPLE REGRESSION

ANALYSIS

Presented by:

Muhammad Usman Dilshad

Reg # 3806

Menu

11

MULTIPLE REGRESSION

ANALYSIS:

Multiple regression analysis is a statistical technique

that can be used to analyze the relationship

between a single dependent (criterion) variable and

several independent (predictor) variables.

12

Multiple Regression

Function:

˚

Y = ß + ß1 X1 + ß2 X2 + ß3 X3…….. €

Y = Dependent Variables

ß = Constant

˚

ß1 = Coefficient of variable X1

X1 = Independent Variables

€ = Error Term

13

An Example of Multiple

Regression Analysis.

14

Assumptions of Multiple Regression Analy

qLinearity.

qMulticollinearity.

qNormality.

qHomoscedasticity.

qOutliers.

q

15

Estimation Techniques

vEnter or Direct Method

v

vHierarchal Method

16

We will now go to SPSS for

analysis .

Retrieve hsbdataB.sav

Analyze Regression Linear

Math achievement= motivation+ grades in h.s. + parent's education

+ gender

17

Binary Logistic

Regression

Presented by:

Muhammad Usman

Dilshad

Reg Id # 3806

Menu

Binary Logistic Regression.

Logistic regression is helpful when you want to predict a

categorical variable from a set of predictor variables.

Binary logistic regression is similar to linear regression

except that it is used when the dependent variable is

dichotomous. Multinomial logistic regression is used

when the dependent/outcome variable has more than two

categories. Logistic regression also is useful when some

or all of the independent variables are dichotomous;

others can be continuous.

19

Assumptions of Binary Logistic Regression Analysis.

There are fewer assumptions for logistic regression than for multiple

regression and discriminant analysis, which is one reason this

technique has become popular, especially in health related fields.

Binary logistic regression assumes that the dependent or outcome

variable is dichotomous and, like most other statistics, that the

outcomes are independent and mutually exclusive; that is, a single

case can only be represented once and must be in one group or the

other.

There should be a minimum of 20 cases per predictor, with a

minimum of 60 total cases. These requirements need to be satisfied

prior to doing statistical analysis with SPSS.

As with multiple regression, multicollinearity is a potential source of

confusing or misleading results and needs to be assessed.

20

When and Why Binary

Logistic Regression?

When the dependent variable is non

parametric and we don't have

homoscedasticity (variance of DV and IV not

equal).

Used when the dependent variable has only

two levels. (Yes/no, male/female, taken/not

taken)

If multivariate normality is suspected.

If we don’t have linearity.

21

Who uses it in Plain

words.

Binary Logistic Regression can be used in the following

situations.

A catalog company wants to increase the proportion of mailings

that result in sales.

A doctor wants to accurately diagnose a possibly cancerous

tumor.

A loan officer wants to know whether the next customer is

likely to default.

Using the Binary Logistic Regression procedure, the catalog

company can send mailings to the people who are most likely

to respond, the doctor can determine whether the tumor is

more likely to be benign or malignant, and the loan officer can

assess the risk of extending credit to a particular customer.

22

A linear regression model can assume values in

between 0 and 1 and more than 1. While

logistic regression results in either 0 or 1.

23

We will now go to SPSS for

analysis .

Analyze regression binary logistic

Algebra 2 = gender + mosaic + visualization test and parents

education

24

Discriminant

Analysis

Presented by:

Muhammad Usman

Dilshad

Reg Id # 3806

Menu

What is Discriminant

Analysis?

Discriminant analysis is appropriate when you

want to predict which group participants will

be in. The procedure produces a discriminant

function (or for more than two groups, a set

of discriminant functions) based on linear

combinations of the predictor variables that

provide the best overall discrimination among

the groups.

The grouping or dependent variable can have

more than two values.

26

Difference b/w Discriminant

Analysis and MANOVA.

In DA, one is trying to devise one or more

predictive equations to maximally

discriminate people in one group from those

in another group; in MANOVA, one is trying to

determine whether group members differ

significantly on a set of several measures.

27

When to use Discriminant

Analysis?

When the Dependent variable is non

parametric and has two or more levels.

When your predictors have multivariate

normality.

28

Discriminant Function

Discriminant analysis attempts to find linear combinations of

those variables that best separate the groups of cases.

These combinations are called discriminant functions and

have the form displayed in the equation.

dik=b0k+b1kxi1+...+bpkxip

where

dik is the value of the kth discriminant function for the ith

case

p is the number of predictors

bjk is the value of the jth coefficient of the kth function

x ij is the value of the ith case of the jth predictor 29

Discriminant Function

Contd..

The number of functions equals min(#groups-

1, #predictors).

The procedure automatically chooses a first

function that will separate the groups as

much as possible. It then chooses a second

function that is both uncorrelated with the

first function and provides as much further

separation as possible. The procedure

continues adding functions in this way until

reaching the maximum number of functions

as determined by the number of predictors

and categories in the dependent variable.

30

Assumptions for Discriminant

Analysis.

The relationships between all pairs of predictors must be

linear, multivariate normality must exist within groups,

and the population covariance matrices for predictor

variables must be equal across groups.

Discriminant analysis is, however, fairly robust to these

assumptions, except violations of multivariate normality in

which case one should use Logistic regression.

Multicollinearity should be minimum which in any regression

analysis causes problems.

It is also important that the sample size of the smallest group

exceed the number of predictor variables in the model.

The linearity assumption as well as the assumption of

homogeneity of variance-covariance matrices should be

tested by examining a matrix scatterplot. If the spreads of

the scatterplots are roughly equal, then the assumption of

homogeneity of variance-covariance matrices can be

assumed.

Best not to use dichotomous independent variable if the

dependent variable don’t have 50/50 split in it.

31

We will now go to SPSS for

analysis .

Analyze Classify Discriminant

Algebra2 in hs = gender + parent’s education+ mosiac +

visualization test

32

Cluster Analysis

Presented by:

Muhammad Usman

Dilshad

Reg Id # 3806

Menu

What is Cluster Analysis?

It is a descriptive analysis technique which groups

objects (respondents, products, firms, variables, etc.)

so that each object is similar to the other objects in

the cluster and different from objects in all the other

clusters.

34

When to use cluster

analysis?

The essence of all clustering approaches is the classification of data

as suggested by “natural” groupings of the data themselves.

Simply put when you desire the following then use

Cluster analysis.

Taxonomy development(segmentation)

Data simplification

Relationship identification

Applications.

It is used to segment the market in Marketing, used

in social networking sites in making new groups

based on users data, Flickr’s map of photos and

other map sites use clustering to reduce the

number of markers on a map.

35

How does cluster analysis

works?

Clusters are made based on the similarities in the objects.

Interobjectsimilarity is an empirical measure of

correspondence, or resemblance, between objects to be

clustered. It can be measured in a variety of ways, but

three methods dominate the applications of cluster

analysis:

Correlational Measures.

Distance Measures.

Association.

36

How…. Contd…

Euclidean distance.

Squared (or absolute) Euclidean distance.

City-block (Manhattan) distance.

Chebychev distance.

Mahalanobis distance (D2).

Euclidean distance.

Straight line distance between objects.

Clusters are made by identifying Centroids. Centroids are

points in a cluster from which every object in that cluster

has minimum distance.

37

Assumptions for Cluster

Analysis.

Sufficient size is needed to ensure

representativeness of the population and its

underlying structure, particularly small groups

within the population.

Outliers can severely distort the representativeness

of the results if they appear as structure (clusters)

that are inconsistent with the research objectives

Representativeness of the sample. The sample must

represent the research question.

Impact of multicollinearity. Input variables should

be examined for substantial multicollinearity and

if present:

Reduce the variables to equal numbers in each set

of correlated measures.

38

Two Variable Cluster Analysis

3

E

y

c

n

e

u

q

e

w

o

L

r

f

2

Low Frequency of Going to Fast Food Restaurants High

39

We will now go to SPSS for

analysis .

Retrieve judges.sav

Analyze classify Hierarchical cluster

All variables.

40

Conjoint

Analysis

Presented by:

Muhammad Usman

Dilshad

Reg Id # 3806

Menu

Conjoint Analysis

Conjoint analysis . . . is a dependence technique used

to understand how respondents develop preferences

for products or services. The dependent variable is a

measure of respondent preference and can be metric

or non-metric (choice-based conjoint). The

independent variables are dummy variables

representing attributes of multi-attribute products or

services.

42

Conjoint Analysis

Some web links.

http://www.youtube.com/watch?v=86iiQjPaVSU&p=

What is conjoint analysis?

http://videolectures.net/kdd09_guo_csbdcayfptm/

Yahoo using conjoint analysis to deliver

preferred data to its users like showing a top

football story to a 20year old and finance

story to 50 year old.

Also using cluster analysis and logistic

regression.

Yahoo using conjoint analysis.ppt

43

Conjoint …..

Is not a new “technique” but an application of techniques we have covered

already:

Metric conjoint analysis is a regression analysis.

Choice-based conjoint is a discrete regression (e.g., logit).

The researcher first constructs a set of real or hypothetical products by

combining selected levels of each attribute (factor):

In most situations, the researcher will need to create an experimental

design.

Some computer programs will create the design (Sawtooth Software,

SPSS Conjoint).

These combinations or profiles are then presented to respondents, who

provide their overall evaluations.

44

Assumptions of Conjoint

Analysis

Few statistical assumptions needed.

Conceptual assumptions are important (e.g., main effects vs.

interactive).

45

Process of Conjoint

Analysis

Identifying of variables (attributes) and their

levels(values).

Constructing an orthogonal design. i.e an list of

combinations of different values of variables.

Printing an experimenter card containing the

orthogonal design to collect target audience

preference rankings.

Entering the rankings in the SPSS.

Running the Conjoint Analysis.

46

We will now go to SPSS for

analysis .

Use carpet_plan.sav and carpet_pref.sav

47

Correspondence

Analysis

Presented by:

Muhammad Usman

Dilshad

Reg Id # 3806

Menu

Correspondence Analysis

It is an interdependence technique that has

become increasingly popular for dimensional

reduction and perceptual mapping. It also is

known as optimal scaling or scoring, reciprocal

averaging or homogeneity analysis.

It is a technique that generates graphical

representations of the interactions between

object (or "categories") of two categorical

variables.

49

Contd…..

Correspondence analysis is a statistical

visualization method for picturing the

associations between the levels of a two-way

contingency table.

In a two-way contingency table, the observed

association of two traits is summarized by the cell

frequencies, and a typical inferential aspect is the

study of whether certain levels of one

characteristic are associated with some

levels of another. Correspondence analysis is a

geometric technique for displaying the rows and

columns of a two-way contingency table as points

in a low-dimensional space, such that the

positions of the row and column points are

consistent with their associations in the table.

The goal is to have a global view of the data that 50

is useful for interpretation.

When to use it?

When you wish to grasp an overall perception

of the inter objects associations.

It is a descriptive analysis and used according

to similar objectives.

Used to analyze consumer perception, brand

positioning etc.

When you have categorical variables.

51

Assumptions of

Correspondence Analysis?

Homogeneity: In correspondence analysis, it is assumed

that there is homogeneity between the column variable of

the analysis. If homogeneity is not present in the analysis,

then the result will be misleading.

Distributional assumption: Correspondence analysis is a

non-parametric technique that assumes distributional

assumptions.

Category assumption: In correspondence analysis, it is

assumed that the discrete data has many categories.

Negative values: In correspondence analysis, negative

value is not considered.

Continuous data: In correspondence analysis, discrete data

is used. If we are using continuous data, then the data

must be categorized into range which leads to loss of

information.

Correspondence analysis is an exploratory technique

not a confirmatory technique.

52

We will now go to SPSS for

analysis .

Retrieve smoking.sav

Analyze Data reduction Correspondence

analysis

53

Structural Equation

Modeling(SEM)

Presented by:

Muhammad Usman

Dilshad

Reg Id # 3806

Menu

What is SEM?

Structural Equations Modeling is a family of statistical

models that seek to explain the relationships among

multiple variables. It examines the “structure” of

interrelationships expressed in a series of equations,

similar to a series of multiple regression equations.

These equations depict all of the relationships among

constructs (the dependent and independent variables)

involved in the analysis. Constructs are unobservable

or latent factors that are represented by multiple

variables.

55

SEM…

Among the strengths of SEM is the ability to

construct latent variables: variables which

are not measured directly, but are estimated

in the model from several measured variables

each of which is predicted to 'tap into' the

latent variables. This allows the modeler to

explicitly capture the unreliability of

measurement in the model, which in theory

allows the structural relations between latent

variables to be accurately estimated. Factor

analysis and regression all represent

special cases of SEM.

56

What is different about

SEM?

Its a graphical method with underlying

equation execution.

Estimation of Multiple and Interrelated

Relationships.

Represents unobserved (latent) concepts and

corrects for measurement error.

Defines a model that explains an entire set of

relationships.

57

Why and when to use

SEM?

SEM may be used as a more powerful

alternative to multiple regression, path

analysis, factor analysis, time series analysis,

and analysis of covariance.

Its is a confirmatory test rather then a

exploratory test.

58

Latent Constructs and

Abbreviations

Exogenous constructs are the latent, multi-item equivalent of

independent variables. They use a variate (linear

combination) of measures to represent the construct, which

acts as an independent variable in the model.( Such

variables which does not become dependent in a equation

are called exogenous)

Multiple measured variables (x) represent the exogenous

constructs (ξ).

Endogenous constructs are the latent, multi-item equivalent

to dependent variables. These constructs are theoretically

determined by factors within the model. (Such variables

which are dependent in equation but are independent, are

called endogenous)

Multiple measured variables (y) represent the endogenous

constructs (η).

59

Assumptions

High Multicollinearity

Linearity.

Outliers

Sample size should be at least 200.

Normality of data and using dichotomous or

ordinal variables should be avoided.

Use of dichotomous variables as endogenous

variable while its exogenous variables are

continuous.

60

Terms in use.

path direct effect of x1 on y2

coefficients

21

x1 y2

11 21

2

exogenous

variable y1

1 endogenous

variables

indirect effect of x1 on y2

is 11 times 21

61 61

We will now go to SPSS for

analysis .

Retrieve hamilton.sav

62

Analysis of

Variance(ANOVA)

Presented by:

Muhammad Usman

Dilshad

Reg Id # 3806

Menu

Anova and Ancova

Analysis of variance and

covariance

Factorial ANOVA and ANCOVA tell you whether

considering more than one independent

variable at a time gives you additional

information over and above what you would

get if you did the appropriate basic inferential

statistics for each independent variable

separately.

Both of these inferential statistics have two or

more independent variables and one scale

(normally distributed) dependent variable.

64

Anova

It is statistical technique used to determine

that whether samples from two or more

groups come from population with equal

means.

Factorial ANOVA is used when there is a small

number of independent variables (usually two

or three)and each of these variables has a

small number of levels or categories (usually

two to four).

65

Types of Anova

One way ANOVA is used to examine differences

on a scale dependent variable between two or

more groups comprising the levels of one

independent variable or factor.

It tests whether the groups formed by the

categories of the independent variable seem

similar (specifically that they have the same

pattern of dispersion as measured by comparing

estimates of group variances). If the groups seem

different, then it is concluded that the

independent variable has an effect on the

dependent (ex., if different treatment groups

have different health outcomes). 66

Types….

Two-way ANOVA analyzes one interval

dependent in terms of the categories (groups)

formed by two independents, one of which

may be conceived as a control variable. Two-

way ANOVA tests whether the groups formed

by the categories of the independent

variables have similar centroids.

67

Assumptions of Factorial

ANOVA

Observations are independent.(should be

ensured while designing and entering data

into SPSS)

The variances of the groups are equal

(homogeneity of variances)-Levene Statistic

tests this assumption.

Factorial ANOVA is robust against violations of

the assumption of the normal distributions of

the dependent variable. Kurtosis -1 to +2.

68

Example

Are there differences in math achievement for

people varying on math grades and/or

father‘s education revised, and is there a

significant interaction between math grades

and father's education on math achievement?

(Another way to ask this latter question: Do

the "effects" of math grades on math

achievement vary depending on level of

father's education revised?)

69

Post Hoc Test

Are used in exploratory research to assess

which group means differ from which others.

E.g. Which simple main effects of math grades

(at each level of father's education revised)

are statistically significant?

70

We will now go to SPSS for

analysis .

Retrieve hsbdatab.sav

Analyze GLM Univariate

Math ach = math grades + fathers edu

71

Analysis of

CoVariance(ANCO

VA)

Presented by:

Muhammad Usman

Dilshad

Reg Id # 3806

Menu

ANCOVA

ANCOVA typically is used to adjust or control

for differences between the groups based on

another, typically interval level, variable

called the covariate. ANCOVA can also be

used if one wants to use one or more discrete

or nominal variables and one or two

continuous variables to predict differences in

one dependent variable.

73

E.g. ….

E.g. imagine that we found that boys and girls

differ on math achievement.

This could be due to the fact that boys take

more math courses in high school. ANCOVA

allows us to adjust the math achievement

scores based on the relationship between

number of math courses taken and math

achievement. We can then determine if boys

and girls still have different math

achievement scores after making the

adjustment.

74

Assumptions for ANCOVA

The observations must be independent.

The dependent variable needs to be normally

distributed.

It is important to have homogeneity of

variances, particularly if sample sizes differ

across levels of the independent variable(s).

Homogeneity can be assessed through Box's

Test or Levene's test.

Specific Assumptions

Linearity( between covariates and dependent

variables)

Homogeneity of Regression coefficients (If F

test is significant, then this assumption has

75

We will now go to SPSS for

analysis .

Retrieve hsbdatab.sav

Analyze GLM Univariate

Math ach = gender * covariates

76

## Muito mais do que documentos

Descubra tudo o que o Scribd tem a oferecer, incluindo livros e audiolivros de grandes editoras.

Cancele quando quiser.