Você está na página 1de 76

1

QTIA Presentation
Presented to,
Mr. Imtiaz Arif,

Presented by
Muhammad Usman Dilshad
usmandilshad@gmail.com
Karachi Pakistan.

2
Topics
Factor Analysis Cluster Analysis

Multiple Regression
Conjoint Analysis

Logistic Regression
Correspondence Analys

Discriminant Analysis
Structural Equation Modeli

ANOVA ANCOVA

3
Factor Analysis
Presented by:
Muhammad Usman
Dilshad
Reg Id # 3806

Menu
Factor
Analysis
Factor analysis is an interdependence
technique whose primary purpose is to
define the underlying structure among the
variables in the analysis.

Factor analysis Identifies a smaller number
of factors among large number of variables .
Have high correlation between
variables(factors)

5
Exploratory vs Confirmatory Factor
Analysis

qExploratory factor analysis


Explores and summarizes underlying
correlation structure for data set
q
qConfirmatory factor analysis
Tests the correlation structure of data
set against a hypothesized structure and rates
the “goodness of fit”.

6
Assumptions
 Multicollinearity:

 Assessed using MSA (measure of sampling adequacy).


 KMO can be used to identify which variables to drop
from the factor analysis because they lack
multicollinearity.

 There is a KMO statistic for each individual variable,


and their sum is the KMO overall statistic.

 KMO varies from 0 to 1.0.


 Overall KMO should be .50 or higher to proceed with


factor analysis.


7
Points to Remember
Testing Assumptions of Factor Analysis

• There must be a strong conceptual foundation to support the


assumption that a structure does exist before the factor
analysis is performed.
• A statistically significant Bartlett’s test of sphericity (sig. > .05)
indicates that sufficient correlations exist among the
variables to proceed.
• Measure of Sampling Adequacy (MSA) values must exceed .50
for both the overall test and each individual variable.
Variables with values less than .50 should be omitted from
the factor analysis one at a time, with the smallest one being
omitted each time.

8
Other uses of Factor Analysis

q
qCalculating Factor Scores
qCreating Summated Scales
qSelecting Substitute Variables
q

9
We will now go to SPSS for
analysis .
Retrieve hsbdataB.sav

Analyze => Data Reduction => Factor


• Next, select the variables itemOl through item!4.

Syntax
FACTOR
/VARIABLES itemOl item02 item03 item04 itemOS item06 itemO? itemOS item09 itemlO itemll item!2 item!3 item!4
/MISSING
LISTWISE /ANALYSIS itemOl item02 item03 item04 itemOS item06 itemO? itemOS item09 itemlO itemll item!2
item!3 item!4
/PRINT INITIAL CORRELATION DET KMO ROTATION
/FORMAT SORT BLANK(.3)
/CRITERIA FACTORS(3) ITERATE(25)
/EXTRACTION PAF
/CRITERIA ITERATE(25)
/ROTATION VARIMAX
/METHOD=CORRELATION.

10
MULTIPLE REGRESSION
ANALYSIS
Presented by:
Muhammad Usman Dilshad
Reg # 3806

Menu
11
MULTIPLE REGRESSION
ANALYSIS:
Multiple regression analysis is a statistical technique
that can be used to analyze the relationship
between a single dependent (criterion) variable and
several independent (predictor) variables.

12
Multiple Regression
Function:

˚
Y = ß + ß1 X1 + ß2 X2 + ß3 X3…….. €



Y = Dependent Variables
ß = Constant
˚
ß1 = Coefficient of variable X1
X1 = Independent Variables
€ = Error Term
13
An Example of Multiple
Regression Analysis.

From MVA 7th ed Book.

14
Assumptions of Multiple Regression Analy

qLinearity.
qMulticollinearity.
qNormality.
qHomoscedasticity.
qOutliers.
q

15
Estimation Techniques
vEnter or Direct Method
v
vHierarchal Method

16
We will now go to SPSS for
analysis .
Retrieve hsbdataB.sav
Analyze  Regression  Linear
Math achievement= motivation+ grades in h.s. + parent's education
+ gender

17
Binary Logistic
Regression
Presented by:
Muhammad Usman
Dilshad
Reg Id # 3806

Menu
Binary Logistic Regression.
Logistic regression is helpful when you want to predict a
categorical variable from a set of predictor variables.
Binary logistic regression is similar to linear regression
except that it is used when the dependent variable is
dichotomous. Multinomial logistic regression is used
when the dependent/outcome variable has more than two
categories. Logistic regression also is useful when some
or all of the independent variables are dichotomous;
others can be continuous.

19
Assumptions of Binary Logistic Regression Analysis.
There are fewer assumptions for logistic regression than for multiple
regression and discriminant analysis, which is one reason this
technique has become popular, especially in health related fields.
Binary logistic regression assumes that the dependent or outcome
variable is dichotomous and, like most other statistics, that the
outcomes are independent and mutually exclusive; that is, a single
case can only be represented once and must be in one group or the
other.
There should be a minimum of 20 cases per predictor, with a
minimum of 60 total cases. These requirements need to be satisfied
prior to doing statistical analysis with SPSS.
As with multiple regression, multicollinearity is a potential source of
confusing or misleading results and needs to be assessed.

20
When and Why Binary
Logistic Regression?
When the dependent variable is non
parametric and we don't have
homoscedasticity (variance of DV and IV not
equal).
Used when the dependent variable has only
two levels. (Yes/no, male/female, taken/not
taken)
If multivariate normality is suspected.
If we don’t have linearity.


21
Who uses it in Plain
words.
Binary Logistic Regression can be used in the following
situations.
 A catalog company wants to increase the proportion of mailings
that result in sales.
 A doctor wants to accurately diagnose a possibly cancerous
tumor.
 A loan officer wants to know whether the next customer is
likely to default.
Using the Binary Logistic Regression procedure, the catalog
company can send mailings to the people who are most likely
to respond, the doctor can determine whether the tumor is
more likely to be benign or malignant, and the loan officer can
assess the risk of extending credit to a particular customer.

22
A linear regression model can assume values in
between 0 and 1 and more than 1. While
logistic regression results in either 0 or 1.

23
We will now go to SPSS for
analysis .
Analyze  regression  binary logistic
Algebra 2 = gender + mosaic + visualization test and parents
education

24
Discriminant
Analysis
Presented by:
Muhammad Usman
Dilshad
Reg Id # 3806

Work Place Discrimination?

Menu
What is Discriminant
Analysis?
Discriminant analysis is appropriate when you
want to predict which group participants will
be in. The procedure produces a discriminant
function (or for more than two groups, a set
of discriminant functions) based on linear
combinations of the predictor variables that
provide the best overall discrimination among
the groups.
The grouping or dependent variable can have
more than two values.

26
Difference b/w Discriminant
Analysis and MANOVA.
In DA, one is trying to devise one or more
predictive equations to maximally
discriminate people in one group from those
in another group; in MANOVA, one is trying to
determine whether group members differ
significantly on a set of several measures.

27
When to use Discriminant
Analysis?
When the Dependent variable is non
parametric and has two or more levels.
When your predictors have multivariate
normality.

28
Discriminant Function
 Discriminant analysis attempts to find linear combinations of
those variables that best separate the groups of cases.
These combinations are called discriminant functions and
have the form displayed in the equation.

dik=b0k+b1kxi1+...+bpkxip

 where
dik is the value of the kth discriminant function for the ith
case
p is the number of predictors
bjk is the value of the jth coefficient of the kth function
x ij is the value of the ith case of the jth predictor 29
Discriminant Function
Contd..
The number of functions equals min(#groups-
1, #predictors).
The procedure automatically chooses a first
function that will separate the groups as
much as possible. It then chooses a second
function that is both uncorrelated with the
first function and provides as much further
separation as possible. The procedure
continues adding functions in this way until
reaching the maximum number of functions
as determined by the number of predictors
and categories in the dependent variable.
 30
Assumptions for Discriminant
Analysis.
 The relationships between all pairs of predictors must be
linear, multivariate normality must exist within groups,
and the population covariance matrices for predictor
variables must be equal across groups.
 Discriminant analysis is, however, fairly robust to these
assumptions, except violations of multivariate normality in
which case one should use Logistic regression.
 Multicollinearity should be minimum which in any regression
analysis causes problems.
 It is also important that the sample size of the smallest group
exceed the number of predictor variables in the model.
 The linearity assumption as well as the assumption of
homogeneity of variance-covariance matrices should be
tested by examining a matrix scatterplot. If the spreads of
the scatterplots are roughly equal, then the assumption of
homogeneity of variance-covariance matrices can be
assumed.
 Best not to use dichotomous independent variable if the
dependent variable don’t have 50/50 split in it.
31
We will now go to SPSS for
analysis .
Analyze  Classify  Discriminant
Algebra2 in hs = gender + parent’s education+ mosiac +
visualization test

32
Cluster Analysis
Presented by:
Muhammad Usman
Dilshad
Reg Id # 3806

Menu
What is Cluster Analysis?
It is a descriptive analysis technique which groups
objects (respondents, products, firms, variables, etc.)
so that each object is similar to the other objects in
the cluster and different from objects in all the other
clusters.

34
When to use cluster
analysis?
The essence of all clustering approaches is the classification of data
as suggested by “natural” groupings of the data themselves.
Simply put when you desire the following then use
Cluster analysis.
 Taxonomy development(segmentation)
 Data simplification
 Relationship identification
 Applications.
It is used to segment the market in Marketing, used
in social networking sites in making new groups
based on users data, Flickr’s map of photos and
other map sites use clustering to reduce the
number of markers on a map.

 35
How does cluster analysis
works?
Clusters are made based on the similarities in the objects.
Interobjectsimilarity is an empirical measure of
correspondence, or resemblance, between objects to be
clustered. It can be measured in a variety of ways, but
three methods dominate the applications of cluster
analysis:

Correlational Measures.
Distance Measures.
Association.


36
How…. Contd…
Euclidean distance.
Squared (or absolute) Euclidean distance.
City-block (Manhattan) distance.
Chebychev distance.
Mahalanobis distance (D2).

Euclidean distance.
Straight line distance between objects.
Clusters are made by identifying Centroids. Centroids are
points in a cluster from which every object in that cluster
has minimum distance.

37
Assumptions for Cluster
Analysis.
Sufficient size is needed to ensure
representativeness of the population and its
underlying structure, particularly small groups
within the population.
Outliers can severely distort the representativeness
of the results if they appear as structure (clusters)
that are inconsistent with the research objectives
Representativeness of the sample. The sample must
represent the research question.
Impact of multicollinearity. Input variables should
be examined for substantial multicollinearity and
if present:
Reduce the variables to equal numbers in each set
of correlated measures.
 38
Two Variable Cluster Analysis

3
E

y
c
n
e
u
q
e

w
o
L
r
f

2
Low Frequency of Going to Fast Food Restaurants High
39
We will now go to SPSS for
analysis .
Retrieve judges.sav
Analyze classify Hierarchical cluster
All variables.

40
Conjoint
Analysis
Presented by:
Muhammad Usman
Dilshad
Reg Id # 3806

Menu
Conjoint Analysis
Conjoint analysis . . . is a dependence technique used
to understand how respondents develop preferences
for products or services. The dependent variable is a
measure of respondent preference and can be metric
or non-metric (choice-based conjoint). The
independent variables are dummy variables
representing attributes of multi-attribute products or
services.

42
Conjoint Analysis
Some web links.
http://www.youtube.com/watch?v=86iiQjPaVSU&p=
What is conjoint analysis?
http://videolectures.net/kdd09_guo_csbdcayfptm/
Yahoo using conjoint analysis to deliver
preferred data to its users like showing a top
football story to a 20year old and finance
story to 50 year old.
Also using cluster analysis and logistic
regression.
Yahoo using conjoint analysis.ppt
43
Conjoint …..
 Is not a new “technique” but an application of techniques we have covered
already:
 Metric conjoint analysis is a regression analysis.
 Choice-based conjoint is a discrete regression (e.g., logit).
 The researcher first constructs a set of real or hypothetical products by
combining selected levels of each attribute (factor):
 In most situations, the researcher will need to create an experimental
design.
 Some computer programs will create the design (Sawtooth Software,
SPSS Conjoint).
 These combinations or profiles are then presented to respondents, who
provide their overall evaluations.

44
Assumptions of Conjoint
Analysis
 Few statistical assumptions needed.

 Conceptual assumptions are important (e.g., main effects vs.
interactive).

45
Process of Conjoint
Analysis
Identifying of variables (attributes) and their
levels(values).
Constructing an orthogonal design. i.e an list of
combinations of different values of variables.
Printing an experimenter card containing the
orthogonal design to collect target audience
preference rankings.
Entering the rankings in the SPSS.
Running the Conjoint Analysis.

46
We will now go to SPSS for
analysis .
Use carpet_plan.sav and carpet_pref.sav

47
Correspondence
Analysis
Presented by:
Muhammad Usman
Dilshad
Reg Id # 3806

Menu
Correspondence Analysis
It is an interdependence technique that has
become increasingly popular for dimensional
reduction and perceptual mapping. It also is
known as optimal scaling or scoring, reciprocal
averaging or homogeneity analysis.
It is a technique that generates graphical
representations of the interactions between
object (or "categories") of two categorical
variables.

49
Contd…..
Correspondence analysis is a statistical
visualization method for picturing the
associations between the levels of a two-way
contingency table.
In a two-way contingency table, the observed
association of two traits is summarized by the cell
frequencies, and a typical inferential aspect is the
study of whether certain levels of one
characteristic are associated with some
levels of another. Correspondence analysis is a
geometric technique for displaying the rows and
columns of a two-way contingency table as points
in a low-dimensional space, such that the
positions of the row and column points are
consistent with their associations in the table.
The goal is to have a global view of the data that 50
is useful for interpretation.
When to use it?
When you wish to grasp an overall perception
of the inter objects associations.
It is a descriptive analysis and used according
to similar objectives.
Used to analyze consumer perception, brand
positioning etc.
When you have categorical variables.

51
Assumptions of
Correspondence Analysis?
 Homogeneity: In correspondence analysis, it is assumed
that there is homogeneity between the column variable of
the analysis. If homogeneity is not present in the analysis,
then the result will be misleading.
 Distributional assumption: Correspondence analysis is a
non-parametric technique that assumes distributional
assumptions.
 Category assumption: In correspondence analysis, it is
assumed that the discrete data has many categories.
 Negative values: In correspondence analysis, negative
value is not considered.
 Continuous data: In correspondence analysis, discrete data
is used. If we are using continuous data, then the data
must be categorized into range which leads to loss of
information.
 Correspondence analysis is an exploratory technique
not a confirmatory technique.
 52
We will now go to SPSS for
analysis .
Retrieve smoking.sav
Analyze  Data reduction  Correspondence
analysis

53
Structural Equation
Modeling(SEM)
Presented by:
Muhammad Usman
Dilshad
Reg Id # 3806

Menu
What is SEM?
Structural Equations Modeling is a family of statistical
models that seek to explain the relationships among
multiple variables. It examines the “structure” of
interrelationships expressed in a series of equations,
similar to a series of multiple regression equations.
These equations depict all of the relationships among
constructs (the dependent and independent variables)
involved in the analysis. Constructs are unobservable
or latent factors that are represented by multiple
variables.

55
SEM…
Among the strengths of SEM is the ability to
construct latent variables: variables which
are not measured directly, but are estimated
in the model from several measured variables
each of which is predicted to 'tap into' the
latent variables. This allows the modeler to
explicitly capture the unreliability of
measurement in the model, which in theory
allows the structural relations between latent
variables to be accurately estimated. Factor
analysis and regression all represent
special cases of SEM.
56
What is different about
SEM?
Its a graphical method with underlying
equation execution.
Estimation of Multiple and Interrelated
Relationships.
Represents unobserved (latent) concepts and
corrects for measurement error.
Defines a model that explains an entire set of
relationships.

57
Why and when to use
SEM?
SEM may be used as a more powerful
alternative to multiple regression, path
analysis, factor analysis, time series analysis,
and analysis of covariance.
Its is a confirmatory test rather then a
exploratory test.

58
Latent Constructs and
Abbreviations
 Exogenous constructs are the latent, multi-item equivalent of
independent variables. They use a variate (linear
combination) of measures to represent the construct, which
acts as an independent variable in the model.( Such
variables which does not become dependent in a equation
are called exogenous)
 Multiple measured variables (x) represent the exogenous
constructs (ξ).
 Endogenous constructs are the latent, multi-item equivalent
to dependent variables. These constructs are theoretically
determined by factors within the model. (Such variables
which are dependent in equation but are independent, are
called endogenous)
 Multiple measured variables (y) represent the endogenous
constructs (η).
 59
Assumptions
High Multicollinearity
Linearity.
Outliers
Sample size should be at least 200.
Normality of data and using dichotomous or
ordinal variables should be avoided.
Use of dichotomous variables as endogenous
variable while its exogenous variables are
continuous.

60
Terms in use.
path direct effect of x1 on y2
coefficients

 21
x1 y2
 11  21
2
exogenous
variable y1
1 endogenous
variables
indirect effect of x1 on y2

is  11 times  21
61 61
We will now go to SPSS for
analysis .
Retrieve hamilton.sav

62
Analysis of
Variance(ANOVA)
Presented by:
Muhammad Usman
Dilshad
Reg Id # 3806

Menu
Anova and Ancova
Analysis of variance and
covariance
Factorial ANOVA and ANCOVA tell you whether
considering more than one independent
variable at a time gives you additional
information over and above what you would
get if you did the appropriate basic inferential
statistics for each independent variable
separately.
Both of these inferential statistics have two or
more independent variables and one scale
(normally distributed) dependent variable.

64
Anova
It is statistical technique used to determine
that whether samples from two or more
groups come from population with equal
means.
Factorial ANOVA is used when there is a small
number of independent variables (usually two
or three)and each of these variables has a
small number of levels or categories (usually
two to four).

65
Types of Anova
One way ANOVA is used to examine differences
on a scale dependent variable between two or
more groups comprising the levels of one
independent variable or factor.
It tests whether the groups formed by the
categories of the independent variable seem
similar (specifically that they have the same
pattern of dispersion as measured by comparing
estimates of group variances). If the groups seem
different, then it is concluded that the
independent variable has an effect on the
dependent (ex., if different treatment groups
have different health outcomes). 66
Types….
Two-way ANOVA analyzes one interval
dependent in terms of the categories (groups)
formed by two independents, one of which
may be conceived as a control variable. Two-
way ANOVA tests whether the groups formed
by the categories of the independent
variables have similar centroids.

67
Assumptions of Factorial
ANOVA
Observations are independent.(should be
ensured while designing and entering data
into SPSS)
The variances of the groups are equal
(homogeneity of variances)-Levene Statistic
tests this assumption.
Factorial ANOVA is robust against violations of
the assumption of the normal distributions of
the dependent variable. Kurtosis -1 to +2.

68
Example
Are there differences in math achievement for
people varying on math grades and/or
father‘s education revised, and is there a
significant interaction between math grades
and father's education on math achievement?
(Another way to ask this latter question: Do
the "effects" of math grades on math
achievement vary depending on level of
father's education revised?)

69
Post Hoc Test
Are used in exploratory research to assess
which group means differ from which others.
E.g. Which simple main effects of math grades
(at each level of father's education revised)
are statistically significant?

70
We will now go to SPSS for
analysis .
Retrieve hsbdatab.sav
Analyze  GLM  Univariate
Math ach = math grades + fathers edu

71
Analysis of
CoVariance(ANCO
VA)
Presented by:
Muhammad Usman
Dilshad
Reg Id # 3806

Menu
ANCOVA
ANCOVA typically is used to adjust or control
for differences between the groups based on
another, typically interval level, variable
called the covariate. ANCOVA can also be
used if one wants to use one or more discrete
or nominal variables and one or two
continuous variables to predict differences in
one dependent variable.

73
E.g. ….
E.g. imagine that we found that boys and girls
differ on math achievement.
This could be due to the fact that boys take
more math courses in high school. ANCOVA
allows us to adjust the math achievement
scores based on the relationship between
number of math courses taken and math
achievement. We can then determine if boys
and girls still have different math
achievement scores after making the
adjustment.
74
Assumptions for ANCOVA
The observations must be independent.
The dependent variable needs to be normally
distributed.
It is important to have homogeneity of
variances, particularly if sample sizes differ
across levels of the independent variable(s).
Homogeneity can be assessed through Box's
Test or Levene's test.
Specific Assumptions
Linearity( between covariates and dependent
variables)
Homogeneity of Regression coefficients (If F
test is significant, then this assumption has
75
We will now go to SPSS for
analysis .
Retrieve hsbdatab.sav
Analyze  GLM  Univariate
Math ach = gender * covariates

76