Hierarchical Multiple Regression)

SW388R7
Data Analysis &

Computers II
Hierarchical Multiple Regression
Slide 1
Differences between hierarchical

and standard multiple regression
Sample problem
Steps in hierarchical multiple regression
Compu
ters II
Differences between standard and hierarchical

multiple regression
Slide 2
Standard multiple regression is used to evaluate the relationship

between a set of independent variables and a dependent variable.
Hierarchical regression is used to evaluate the relationship between a

set of independent variables and the dependent variable, controlling
for or taking into account the impact of a different set of independent
variables on the dependent variable.
For example, a research hypothesis might state that there are

differences between the average salary for male employees and female
employees, even after we take into account differences between
education levels and prior work experience.
In hierarchical regression, the independent variables are entered into

the analysis in a sequence of blocks, or groups that may contain one or
more variables. In the example above, education and work experience
would be entered in the first block and sex would be entered in the
second block.
Compu
ters II
Differences in statistical results
Slide 3
SPSS shows the statistical results (Model Summary, ANOVA,

Coefficients, etc.) as each block of variables is entered into the
analysis.
In addition (if requested), SPSS prints and tests the key statistic
used in evaluating the hierarchical hypothesis: change in R for
each additional block of variables.
The null hypothesis for the addition of each block of variables

to the analysis is that the change in R (contribution to the
explanation of the variance in the dependent variable) is zero.
If the null hypothesis is rejected, then our interpretation

indicates that the variables in block 2 had a relationship to the
dependent variable, after controlling for the relationship of the
block 1 variables to the dependent variable.
Compu
ters II
Variations in hierarchical regression - 1
Slide 4
A hierarchical regression can have as many blocks as there are

independent variables, i.e. the analyst can specify a hypothesis
that specifies an exact order of entry for variables.
A more common hierarchical regression specifies two blocks of

variables: a set of control variables entered in the first block
and a set of predictor variables entered in the second block.
Control variables are often demographics which are thought to

make a difference in scores on the dependent variable.
Predictors are the variables in whose effect our research
question is really interested, but whose effect we want to
separate out from the control variables.
Compu
ters II
Variations in hierarchical regression - 2
Slide 5
Support for a hierarchical hypothesis would be expected to

require statistical significance for the addition of each block of
variables.
However, many times, we want to exclude the effect of blocks

of variables previously entered into the analysis, whether or not
a previous block was statistically significant. The analysis is
interested in obtaining the best indicator of the effect of the
predictor variables. The statistical significance of previously
entered variables is not interpreted.
The latter strategy is the one that we will employ in our

problems.
Compu
ters II
Differences in solving hierarchical regression

problems
Slide 6
R change, i.e. the increase when the predictors variables are

added to the analysis is interpreted rather than the overall R
for the model with all variables entered.
In the interpretation of individual relationships, the

relationship between the predictors and the dependent variable
is presented.
Similarly, in the validation analysis, we are only concerned with

verifying the significance of the predictor variables.
Differences in control variables are ignored.
Compu
ters II
Slide 7
A hierarchical regression problem
The problem asks us to examine the feasibility

of doing multiple regression to evaluate the
relationships among these variables. The
inclusion of the controlling for phrase
indicates that this is a hierarchical multiple
regression problem.
Multiple regression is feasible if the dependent
variable is metric and the independent
variables (both predictors and controls) are
metric or dichotomous, and the available data
is sufficient to satisfy the sample size
requirements.
Compu
ters II
Slide 8
Level of measurement - answer

Hierarchical multiple regression
requires that the dependent
variable be metric and the
independent variables be metric
or dichotomous.
"Spouse's highest academic degree" [spdeg] is ordinal, satisfying the

metric level of measurement requirement for the dependent variable, if
we follow the convention of treating ordinal level variables as metric.
Since some data analysts do not agree with this convention, a note of
caution should be included in our interpretation.
"Age" [age] is interval, satisfying the metric or dichotomous level of
measurement requirement for independent variables.
"Highest academic degree" [degree] is ordinal, satisfying the metric or
dichotomous level of measurement requirement for independent
variables, if we follow the convention of treating ordinal level variables
as metric. Since some data analysts do not agree with this convention, a
note of caution should be included in our interpretation.
"Sex" [sex] is dichotomous, satisfying the metric or dichotomous level of
measurement requirement for independent variables.
True with caution
is the correct
answer.
Compu
ters II
Slide 9
Sample size - question
The second question asks about the

sample size requirements for multiple
regression.
To answer this question, we will run the
initial or baseline multiple regression to
obtain some basic data about the
problem and solution.
ters II
Slide
10
The baseline regression - 1
After we check for violations of

assumptions and outliers, we will
make a decision whether we should
interpret the model that includes the
transformed variables and omits
outliers (the revised model), or
whether we will interpret the model
that uses the untransformed
variables and includes all cases
including the outliers (the baseline
model).
In order to make this decision, we
run the baseline regression before
we examine assumptions and
outliers, and record the R for the
baseline model. If using
transformations and outliers
substantially improves the analysis
(a 2% increase in R), we interpret
the revised model. If the increase is
smaller, we interpret the baseline
model.
To run the baseline

model, select Regression
| Linear from the
Analyze model.
ters II
Slide
11

First, move the
dependent variable spdeg
to the Dependent text
box.
Second, move the

independent variables to
control for age and sex
to the Independent(s)
list box.
Fourth, click on the Next

button to tell SPSS to add
another block of variables
to the regression analysis.
Third, select the method for

entering the variables into the
analysis from the drop down
Method menu. In this example,
we accept the default of Enter for
direct entry of all variables in the
first block which will force the
controls into the regression.
ters II
Slide
12

SPSS identifies that we
will now be adding
variables to a second
block.
First, move the

predictor independent
variable degree to the
Independent(s) list box
for block 2.
Second, click on the

Statistics button to
specify the statistics
options that we want.
ters II
Slide
13

First, mark the
checkboxes for
Estimates on the
Regression
Coefficients panel.
Second, mark the checkboxes for Model

Fit, Descriptives, and R squared change.
The R squared change statistic will tell
us whether or not the variables added
after the controls have a relationship to
the dependent variable.
Fifth, click on
the Continue
button to close
the dialog box.
Third, mark the

Durbin-Watson
statistic on the
Residuals panel.
Fourth, mark the

Collinearity diagnostics
to get tolerance values
for testing
multicollinearity.
ters II
Slide
14
Click on the OK
button to
request the
regression
output.
ters II
Slide
15
R for the baseline model

The R of 0.281 is the benchmark
that we will use to evaluate the
utility of transformations and the
elimination of outliers.
Prior to any transformations of variables

to satisfy the assumptions of multiple
regression or the removal of outliers,
the proportion of variance in the
dependent variable explained by the
independent variables (R) was 28.1%.
The relationship is statistically
significant, though we would not stop if
it were not significant because the lack
of significance may be a consequence of
violation of assumptions or the inclusion
of outliers.
ters II
Slide
16
Sample size evidence and answer
Descriptive Statistics
Mean
SPOUSES HIGHEST
DEGREE
AGE OF RESPONDENT
RESPONDENTS SEX
RS HIGHEST DEGREE
Std. Deviation
1.78
1.281
136
45.80
1.60
1.65
14.534
.491
1.220
136
136
136
Hierarchical multiple regression requires that the

minimum ratio of valid cases to independent
variables be at least 5 to 1. The ratio of valid
cases (136) to number of independent variables
(3) was 45.3 to 1, which was equal to or greater
than the minimum ratio. The requirement for a
minimum ratio of cases to independent variables
was satisfied.
In addition, the ratio of 45.3 to 1 satisfied the
preferred ratio of 15 cases per independent
variable.
The answer to the question is true.
ters II
Slide
17
Assumption of normality for the dependent

variable - question
Having satisfied the level of measurement

and sample size requirements, we turn our
attention to conformity with three of the
assumptions of multiple regression:
normality, linearity, and homoscedasticity.
First, we will evaluate the assumption of
normality for the dependent variable.
ters II
Slide
18
Run the script to test normality

First, move the variables to the
list boxes based on the role that
the variable plays in the analysis
and its level of measurement.
Second, click on the Normality option

button to request that SPSS produce
the output needed to evaluate the
assumption of normality.
Fourth, click on
the OK button to
produce the output.
Third, mark the checkboxes
for the transformations that
we want to test in evaluating
the assumption.
ters II
Slide
19
Normality of the dependent variable:

spouses highest degree
Descriptives
SPOUSES
HIGHEST DEGREE
Mean
95% Confidence
Interval for Mean
Lower Bound
Upper Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
The dependent variable "spouse's highest

academic degree" [spdeg] did not satisfy the
criteria for a normal distribution. The
skewness of the distribution (0.573) was
between -1.0 and +1.0, but the kurtosis of
the distribution (-1.051) fell outside the
range from -1.0 to +1.0.
Statistic
1.78
1.56
Std. Error
.110
2.00
1.75
1.00
1.640
1.281
0
4
4
2.00
.573
-1.051
.208
.413
The answer to the

question is false.
ters II
Slide
20
Normality of the transformed dependent variable:

spouses highest degree
The "log of spouse's highest academic degree

[LGSPDEG=LG10(1+SPDEG)]" satisfied the criteria
for a normal distribution. The skewness of the
distribution (-0.091) was between -1.0 and +1.0 and
the kurtosis of the distribution (-0.678) was between
-1.0 and +1.0.
The "log of spouse's highest academic degree
[LGSPDEG=LG10(1+SPDEG)]" was substituted for
"spouse's highest academic degree" [spdeg] in the
analysis.
ters II
Slide
21
Normality of the control variable: age
Next, we will evaluate the

assumption of normality for
the control variable, age.
ters II
Slide
22
Normality of the control variable: age

Descriptives
AGE OF RESPONDENT Mean
95% Confidence
Interval for Mean
Lower Bound
Upper Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
The independent variable "age" [age]

satisfied the criteria for a normal distribution.
The skewness of the distribution (0.595) was
between -1.0 and +1.0 and the kurtosis of
the distribution (-0.351) was between -1.0
and +1.0.
Statistic
45.99
43.98
Std. Error
1.023
48.00
45.31
43.50
282.465
16.807
19
89
70
24.00
.595
-.351
.148
.295
ters II
Slide
23
Normality of the predictor variable:

highest academic degree
Next, we will evaluate the

assumption of normality for
the predictor variable,
highest academic degree.
ters II
Slide
24
Normality of the predictor variable:

respondents highest academic degree
Descriptives
RS HIGHEST DEGREE
Mean
95% Confidence
Interval for Mean
Lower Bound
Upper Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
The independent variable "highest academic

degree" [degree] satisfied the criteria for a
normal distribution. The skewness of the
distribution (0.948) was between -1.0 and
+1.0 and the kurtosis of the distribution
(-0.051) was between -1.0 and +1.0.
Statistic
1.41
1.27
Std. Error
.071
1.55
1.35
1.00
1.341
1.158
0
4
4
1.00
.948
-.051
.149
.297
ters II
Slide
25
Assumption of linearity for spouses degree and

respondents degree - question
The metric independent variables satisfied the criteria for

normality, but the dependent variable did not.
However, the logarithmic transformation of "spouse's highest
academic degree" produced a variable that was normally
distributed and will be tested as a substitute in the analysis.
The script for linearity will support our using the transformed
dependent variable without having to add it to the data set.
ters II
Slide
26
Run the script to test linearity
When the linearity option is

selected, a default set of
transformations to test is marked.
First, click on the Linearity

option button to request
that SPSS produce the
output needed to evaluate
the assumption of linearity.
Second , since we have decided to

use the log transformation of the
dependent variable, we mark the
check box for the Logarithmic
transformation and clear the check
box for the Untransformed version
of the dependent variable.
Third, click on the

OK button to
produce the output.
ters II
Slide
27
Linearity test: spouses highest degree and

respondents highest academic degree
The correlation between "highest

academic degree" and logarithmic
transformation of "spouse's highest
academic degree" was statistically
significant (r=.519, p<0.001). A
linear relationship exists between
these variables.
ters II
Slide
28
Linearity test: spouses highest degree and

respondents age
The assessment of the linear

relationship between logarithmic
transformation of "spouse's highest
academic degree"
[LGSPDEG=LG10(1+SPDEG)] and "age"
[age] indicated that the relationship
was weak, rather than nonlinear.
Neither the correlation between
logarithmic transformation of "spouse's
highest academic degree" and "age"
nor the correlations with the
transformations were statistically
significant.
The correlation between "age" and
logarithmic transformation of "spouse's
highest academic degree" was not
statistically significant (r=.009,
p=0.921). The correlations for the
transformations were: the logarithmic
transformation (r=.061, p=0.482); the
square root transformation (r=.034,
p=0.692); the inverse transformation
(r=.112, p=0.194); and the square
transformation (r=-.037, p=0.668)
ters II
Slide
29
Assumption of homogeneity of variance - question
Sex is the only dichotomous

independent variable in the analysis.
We will test if for homogeneity of
variance using the logarithmic
transformation of the dependent
variable which we have already
decided to use.
ters II
Slide
30
Run the script to test

homogeneity of variance
When the homogeneity of variance

option is selected, a default set of
transformations to test is marked.
First, click on the
Homogeneity of variance
option button to request
that SPSS produce the
output needed to evaluate
the assumption of linearity.
Second , since we have decided to

use the log transformation of the
dependent variable, we mark the
check box for the Logarithmic
transformation and clear the check
box for the Untransformed version
of the dependent variable.
Third, click on the

OK button to
produce the output.
ters II
Slide
31
Assumption of homogeneity of variance evidence and

answer
Based on the Levene Test, the

variance in "log of spouse's highest
academic degree
[LGSPDEG=LG10(1+SPDEG)]" was
homogeneous for the categories of
"sex" [sex]. The probability
associated with the Levene statistic
(0.687) was p=0.409, greater than
the level of significance for testing
assumptions (0.01). The null
hypothesis that the group variances
were equal was not rejected.
The homogeneity of variance
assumption was satisfied. The
answer to the question is true.
ters II
Slide
32
Including the transformed variable in the data set - 1
In the evaluation for normality, we resolved a problem with

normality for spouses highest academic degree with a
logarithmic transformation. We need to add this transformed
variable to the data set, so that we can incorporate it in our
detection of outliers.
We can use the script to compute transformed variables and add
them to the data set.
We select an assumption to test (Normality is the easiest), mark
the check box for the transformation we want to retain, and
clear the check box "Delete variables created in this analysis."
NOTE: this will leave the transformed

variable in the data set. To remove it,
you can delete the column or close the
data set without saving.
ters II
Slide
33

First, move the variable
SPDEG to the list box for

Normality option button to
request that SPSS do the test
for normality, including the
transformation we will mark.
Third, mark the transformation

we want to retain (Logarithmic)
and clear the checkboxes for
the other transformations.
Fourth, clear the check

box for the option
"Delete variables
created in this analysis".
Fifth, click on
the OK button.
ters II
Slide
34
If we scroll to the rightmost

column in the data editor, we
see than the log of SPDEG in
included in the data set.
ters II
Slide
35
Including the transformed variable in the list of

variables in the script - 1
If we scroll to the bottom of

the list of variables, we see
that the log of SPDEG is not
included in the list of available
variables.
To tell the script to add the

log of SPDEG to the list of
variables in the script, click
on the Reset button. This
will start the script over
again, with a new list of
variables from the data set.
ters II
Slide
36
Including the transformed variable in the list of

variables in the script - 2
If we scroll to the bottom of

the list of variables now, we
see that the log of SPDEG is
included in the list of available
variables.
ters II
Slide
37
Detection of outliers - question
In multiple regression, an outlier in the solution

can be defined as a case that has a large residual
because the equation did a poor job of predicting
its value.
We will run the regression again incorporating any
transformations we have decided to test, and have
SPSS compute the standardized residual for each
case. Cases with a standardized residual larger
than +/- 3.0 will be treated as outliers.
ters II
Slide
38
The revised regression using transformations
To run the regression to

detect outliers, select the
Linear Regression command
from the menu that drops
down when you click on the
Dialog Recall button.
ters II
Slide
39
The revised regression:

substituting transformed variables
Remove the variable SPDEG

from the list of independent
variables. Include the log of
the variable, LGSPDEG.
Click on the Statistics

button to select statistics
we will need for the
analysis.
ters II
Slide
40
The revised regression: selecting statistics

First, mark the
checkboxes for
Estimates on the
Regression
Coefficients panel.
Second, mark the checkboxes for Model

Fit, Descriptives, and R squared change.
The R squared change statistic will tell
us whether or not the variables added
after the controls have a relationship to
Third, mark the

Durbin-Watson
statistic on the
Residuals panel.
Fourth, mark the

checkbox for the
Casewise diagnostics,
which will be used to
identify outliers.
Sixth, click on
the Continue
button to close
the dialog box.
Fifth, mark the

Collinearity diagnostics
to get tolerance values
for testing
multicollinearity.
ters II
Slide
41
The revised regression: saving standardized residuals
Mark the checkbox for

Standardized Residuals so
that SPSS saves a new
variable in the data editor.
We will use this variable to
omit outliers in the revised
regression model.
Click on the
Continue
button to close
the dialog box.
ters II
Slide
42
The revised regression: obtaining output
Click on the OK
button to obtain
the output for the
revised model.
ters II
Slide
43
Outliers in the analysis

If cases have a standardized residual larger than +/- 3.0,
SPSS creates a table titled Casewise Diagnostics, in which it
lists the cases and values that results in their being an outlier.
If there are no outliers, SPSS does not print the Casewise
Diagnostics table. There was no table for this problem. The
answer to the question is true.
We can verify that all standardized residuals

were less than +/- 3.0 by looking the
minimum and maximum standardized
residuals in the table of Residual Statistics.
Both the minimum and maximum fell in the
acceptable range.
Since there were no outliers,
we can use the regression just
completed to make our decision
about which model to interpret.
ters II
Slide
44
Selecting the model to interpret - question
Since there were no outliers, we can

use the regression just completed to
make our decision about which
model to interpret.
If the R for the revised model is
higher by 2% or more, we will base
out interpretation on the revised
model; otherwise, we will interpret
the baseline model.
ters II
Slide
45
Selecting the model to interpret evidence and

answer
Prior to any transformations of variables to

satisfy the assumptions of multiple regression
and the removal of outliers, the proportion of
variance in the dependent variable explained by
the independent variables (R) was 28.1%.
After substituting transformed variables, the
proportion of variance in the dependent variable
explained by the independent variables (R)
was 27.1%.
Since the revised regression model did not
explain at least two percent more variance than
explained by the baseline regression analysis,
the baseline regression model with all cases and
the original form of all variables should be used
for the interpretation.
The transformations used to satisfy the
assumptions will not be used, so cautions
should be added for the assumptions violated.
False is the correct answer to the question.
ters II
Slide
46
Re-running the baseline regression - 1
Having decided to use the baseline

model for the interpretation of this
analysis, the SPSS regression
output was re-created.
To run the baseline regression

again, select the Linear
Regression command from
the menu that drops down
when you click on the Dialog
Recall button.
ters II
Slide
47
Remove the transformed

variable lgspdeg from the
dependent variable textbox
and add the variable spdeg.
Click on the Save

button to remove
the request to
save standardized
residuals to the
data editor.
ters II
Slide
48
Revised regression using transformations

and omitting outliers - 3
Clear the checkbox for

Standardized Residuals
so that SPSS does not
save a new set of them
in the data editor when it
runs the new regression.
Click on the
Continue
button to close
the dialog box.
ters II
Slide
49
Click on the OK
button to
request the
regression
output.
ters II
Slide
50
Assumption of independence of errors - question
We can now check the

assumption of independence
of errors for the analysis we
will interpret.
ters II
Slide
51
Assumption of independence of errors:

evidence and answer
Model Summaryc
Model
1
2
Having selected Adjusted

a regression
model
Std. Error
of for
R Square
can now
Rinterpretation,
R Square we
R Square
theexamine
Estimate the
Change
final
of-.015
independence
.014a assumptions
.000
1.290 of
.000
b
errors.
.531
.281
.265
1.098
.281
Change Statistics
F Change
.013
51.670
df1
df2
133
132
2
1
Sig. F Change
.987
.000
Durbin-W
atson
a. Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENT
The
Durbin-Watson statistic is used to
b. Predictors:
(Constant), RESPONDENTS SEX, AGE OF RESPONDENT, RS HIGHEST DEGREE
test for the presence of serial correlation
among the residuals, i.e., the
assumption of independence of errors,
which requires that the residuals or
errors in prediction do not follow a
pattern from case to case.
c. Dependent Variable: SPOUSES HIGHEST DEGREE
The value of the Durbin-Watson statistic

ranges from 0 to 4. As a general rule of
thumb, the residuals are not correlated
if the Durbin-Watson statistic is
approximately 2, and an acceptable
range is 1.50 - 2.50.
The Durbin-Watson
statistic for this problem is
1.754 which falls within
the acceptable range.
If the Durbin-Watson
statistic was not in the
acceptable range, we
would add a caution to the
findings for a violation of
regression assumptions.
The answer to the
question is true.
1.754
ters II
Slide
52
Multicollinearity - question
The final condition that can have

an impact on our interpretation
is multicollinearity.
ters II
Slide
53
Multicollinearity evidence and answer
The tolerance values for all of the independent variables

are larger than 0.10: "highest academic degree" [degree]
(.990), "age" [age] (.954) and "sex" [sex] (.947).
Multicollinearity is not a problem in this regression analysis.
True is the correct answer to the question.
ters II
Slide
54
Overall relationship between dependent variable

and independent variables - question
The first finding we want to

confirm concerns the
relationship between the
dependent variable and the set
of predictors after including the
control variables in the analysis.
ters II
Slide
55

and independent variables evidence and answer
Hierarchical multiple regression was performed to test the
hypothesis that there was a relationship between the dependent
variable "spouse's highest academic degree" [spdeg] and the
predictor independent variables "highest academic degree"
[degree] after controlling for the effect of the control independent
variables "age" [age] and "sex" [sex]. In hierarchical regression,
the interpretation for overall relationship focuses on the change in
R. If change in R is statistically significant, the overall
relationship for all independent variables will be significant as well.
ters II
Slide
56

Based on model 2 in the Model Summary table where the predictors

were added , (F(1, 132) = 51.670, p<0.001), the predictor
variable, highest academic degree, did contribute to the overall
relationship with the dependent variable, spouse's highest academic
degree. Since the probability of the F statistic (p<0.001) was less
than or equal to the level of significance (0.05), the null hypothesis
that change in R was equal to 0 was rejected. The research
hypothesis that highest academic degree reduced the error in
predicting spouse's highest academic degree was supported.
ters II
Slide
57

The increase in R by including the predictor variables

("highest academic degree") in the analysis was 0.281,
not 0.241.
Using a proportional reduction in error interpretation for
R, information provided by the predictor variables
reduced our error in predicting "spouse's highest
academic degree" [spdeg] by 28.1%, not 24.1%.
The answer to the

question is false because
the problem stated an
incorrect statistical value.
ters II
Slide
58
Relationship of the predictor variable and the

dependent variable - question
In these hierarchical regression

problems, we will focus the
interpretation of individual relationships
on the predictor variables and ignore the
contribution of the control variables.
ters II
dependent variable evidence and answer
Slide
59
Coefficientsa
Model
1
(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
RS HIGHEST DEGREE
Unstandardized
Coefficients
B
Std. Error
1.781
.577
.001
.008
-.023
.231
.525
.521
.003
.007
.114
.198
.559
.078
Standardized
Coefficients
Beta
.009
-.009
.037
.044
.533
t
3.085
.100
-.100
1.007
.495
.575
7.188
a. Dependent Variable: SPOUSES HIGHEST DEGREE
Based on the statistical test of the b coefficient

(t = 7.188, p<0.001) for the independent
variable "highest academic degree" [degree],
the null hypothesis that the slope or b
coefficient was equal to 0 (zero) was rejected.
The research hypothesis that there was a
relationship between "highest academic
degree" and "spouse's highest academic
degree" was supported.
Sig.
.002
.920
.920
.316
.622
.566
.000
Collinearity Statistics
Tolerance
VIF
.956
.956
1.046
1.046
.954
.947
.990
1.049
1.056
1.010
ters II
dependent variable evidence and answer
Slide
60
Coefficientsa
Model
1
(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
RS HIGHEST DEGREE
Unstandardized
Coefficients
B
Std. Error
1.781
.577
.001
.008
-.023
.231
.525
.521
.003
.007
.114
.198
.559
.078
a. Dependent Variable: SPOUSES HIGHEST DEGREE
Standardized
Coefficients
Beta
Collinearity Statistics
Tolerance
VIF
t
Sig.
3.085
.002
The b coefficient for the relationship
.009
.956
between
the .100
dependent.920
variable "spouse's
-.009 academic
-.100 degree"
.920[spdeg].956
highest
and the
independent1.007
variable "highest
academic
.316
degree"
[degree].
was
.559,
which.954
implies
.037
.495
.622
a direct relationship because the sign of
.575
.566Higher numeric
.947
the.044
coefficient
is positive.
values
for
the
independent
variable
.533
7.188
.000
.990
"highest academic degree" [degree] are

associated with higher numeric values for
the dependent variable "spouse's highest
academic degree" [spdeg].
The statement in the problem that "survey

respondents who had higher academic
degrees had spouses with higher academic
degrees" is correct. The answer to the
question is true with caution. Caution in
interpreting the relationship should be
exercised because of an ordinal variable
treated as metric; and violation of the
assumption of normality.
1.046
1.046
1.049
1.056
1.010
ters II
Slide
61
Validation analysis - question
The problem states the

random number seed to use
in the validation analysis.
ters II
Slide
62
Validation analysis:
set the random number seed
Validate the results of

your regression analysis
by conducting a 75/25%
cross-validation, using
998794 as the random
number seed.
To set the random number

seed, select the Random
Number Seed command
from the Transform menu.
ters II
Slide
63
Set the random number seed
First, click on the

Set seed to option
button to activate
the text box.
Second, type in the

random seed stated in
the problem.
Third, click on the OK

button to complete the
dialog box.
Note that SPSS does not
provide you with any
feedback about the change.
ters II
Slide
64
Validation analysis:
compute the split variable
To enter the formula for the

variable that will split the
sample in two parts, click
on the Compute
command.
ters II
Slide
65
The formula for the split variable

First, type the name for the
new variable, split, into the
Target Variable text box.
Second, the formula for the
value of split is shown in the
text box.
The uniform(1) function
generates a random decimal
number between 0 and 1.
The random number is
compared to the value 0.75.
Third, click on the

OK button to
complete the dialog
box.
If the random number is less

than or equal to 0.75, the
value of the formula will be 1,
the SPSS numeric equivalent
to true. If the random
number is larger than 0.75,
the formula will return a 0,
the SPSS numeric equivalent
to false.
ters II
Slide
66
The split variable in the data editor
In the data editor, the

split variable shows a
random pattern of zeros
and ones.
To select the cases for the
training sample, we select
the cases where split = 1.
ters II
Slide
67
Repeat the regression for the validation
To run the regression for the

validation training sample,
select the Linear Regression
command from the menu that
drops down when you click on
the Dialog Recall button.
ters II
Slide
68
Using "split" as the selection variable
First, scroll
down the list of
variables and
highlight the
variable split.

right arrow button to
move the split variable
to the Selection
Variable text box.
ters II
Slide
69
Setting the value of split to select cases
When the variable named

split is moved to the
Selection Variable text
box, SPSS adds "=?" after
the name to prompt up to
enter a specific value for
split.
Click on the
Rule button
to enter a
value for split.
ters II
Slide
70
Completing the value selection
First, type the value

for the training
sample, 1, into the
Value text box.

Continue button to
complete the value entry.
ters II
Slide
71
Requesting output for the validation analysis
Click on the OK
button to
request the
output.
When the value entry

dialog box is closed, SPSS
adds the value we entered
after the equal sign. This
specification now tells
SPSS to include in the
analysis only those cases
that have a value of 1 for
the split variable.
ters II
Slide
72
Validation analysis - 1
The validation analysis requires that the

regression model for the 75% training
sample replicate the pattern of statistical
significance found for the full data set.
In the analysis of the 75% training sample, the

relationship between the set of independent
variables and the dependent variable was
statistically significant, F(3, 103) = 11.569,
p<0.001, as was the overall relationship in the
analysis of the full data set, F(3, 132) = 17.235,
p<0.001
ters II
Slide
73
The validation of a hierarchical regression
model also requires that the change in R
demonstrate statistical significance in the
analysis of the 75% training sample.
The R change of 0.249

satisfied this requirement
(F change(1, 103) =
34.319, p<0.001).
ters II
Slide
74
The pattern of significance for the individual
relationships between the dependent variable and
the predictor variable was the same for the
analysis using the full data set and the 75%
training sample.
The relationship between highest academic degree and

spouse's highest academic degree was statistically significant
in both the analysis using the full data set (t=7.188,
p<0.001) and the analysis using the 75% training sample
(t=5.484, p<0.001). The pattern of statistical significance of
the independent variables for the analysis using the 75%
training sample matched the pattern identified in the
analysis of the full data set.
ters II
Slide
75
The total proportion of variance explained in the

model using the training sample was 25.2%
(.502), compared to 40.6% (.637) for the
validation sample. The value of R for the
validation sample was actually larger than the
value of R for the training sample, implying a
better fit than obtained for the training sample.
This supports a conclusion that the regression
model would be effective in predicting scores for
cases other than those included in the sample.
The validation analysis

supported the
generalizability of the
findings of the analysis to
the population
represented by the sample
in the data set.
The answer to the
question is true.
SW388R7
Data Analysis &
Computers II
Slide 76
Steps in complete hierarchical

regression analysis
The following flow charts depict the process for solving the complete
regression problem and determining the answer to each of the
questions encountered in the complete analysis.
Text in italics (e.g. True, False, True with caution, Incorrect
application of a statistic) represent the answers to each specific
question.
Many of the steps in hierarchical regression analysis are identical to
the steps in standard regression analysis. Steps that are different are
identified with a magenta background, with the specifics of the
difference underlined.
ters II
Slide
77
Complete Hierarchical multiple regression analysis:

level of measurement
Question: do variables included in the analysis satisfy the level
of measurement requirements?
Is the dependent
variable metric and the
independent variables
metric or dichotomous?
Examine all independent
variables controls as
well as predictors
No
Incorrect
application of
a statistic
Yes
Ordinal variables included
in the relationship?
No
True
Yes
True with caution
ters II
Slide
78

sample size
Question: Number of variables and cases satisfy sample size
requirements?
Compute the baseline
regression in SPSS
Ratio of cases to
independent variables at
least 5 to 1?
Include both controls and

predictors, in the count of
No
Inappropriate
application of
a statistic
Yes
Ratio of cases to
independent variables at
preferred sample size of at
least 15 to 1?
Yes
True
No
True with caution
ters II
Slide
79

assumption of normality
Question: each metric variable satisfies the assumption of
normality?
Test the dependent
variable and both
controls and predictor
The variable satisfies
criteria for a normal
distribution?
Yes
True
If more than one
transformation
satisfies normality,
use one with
smallest skew
No
False
Log, square root, or

inverse
transformation
satisfies normality?
Yes
Use transformation
in revised model,
no caution needed
No
Use untransformed
variable in analysis,
add caution to
interpretation for
violation of normality
ters II
assumption of linearity
Slide
80
Question: relationship between dependent variable and metric

independent variable satisfies assumption of linearity?
If dependent variable was
transformed for normality, use
transformed dependent
variable in the test for linearity.
Probability of Pearson
correlation (r) <=
level of significance?
If independent variable
was transformed to
satisfy normality, skip
check for linearity.
No
If more than one

transformation
satisfies
linearity, use one
with largest r
Probability of correlation
(r) for relationship with
any transformation of IV
<= level of significance?
No
Test both
control and
predictor
independen
t variables
Yes
Yes
Use transformation
in revised model
True
Weak
relationship.
No caution
needed
ters II
Slide
81

assumption of homogeneity of variance
Question: variance in dependent variable is uniform across the
categories of a dichotomous independent variable?
If dependent variable was
transformed for normality,
substitute transformed
dependent variable in the test
for the assumption of
homogeneity of variance
Test both
control and
predictor
independen
t variables
Probability of Levene
statistic <= level of
significance?
No
True
Yes
False
Do not test transformations of

dependent variable, add caution to
interpretation for violation of
homoscedasticity
ters II
Slide
82
Complete Hierarchical multiple regression

analysis: detecting outliers
Question: After incorporating any transformations, no outliers
were detected in the regression analysis.
If any variables were transformed
for normality or linearity, substitute
transformed variables in the
regression for the detection of
outliers.
Is the standardized residual

for any case greater than
+/-3.00?
Yes
False
No
True
Remove outliers and run

revised regression again.
ters II
Slide
83

picking regression model for interpretation
Question: interpretation based on model that includes
transformation of variables and removes outliers?
Yes
Pick revised regression with
transformations and omitting
outliers for interpretation
True
R for revised regression

greater than R for
baseline regression by 2%
or more?
No
Pick baseline regression with
untransformed variables and all
cases for interpretation
False
ters II
Slide
84

assumption of independence of errors
Question: serial correlation of errors is not a problem in this regression
analysis?
Residuals are
independent,
Durbin-Watson between
1.5 and 2.5?
Yes
True
No
False
NOTE: caution
for violation of
assumption of
independence of
errors
ters II
Slide
85

multicollinearity
Question: Multicollinearity is not a problem in this regression analysis?
Tolerance for all IVs

greater than 0.10,
indicating no
multicollinearity?
Yes
True
No
False
NOTE: halt the

analysis until
problem is
diagnosed
ters II
Slide
86

overall relationship
Question: Finding about overall relationship between
dependent variable and independent variables.
Probability of F test of R
change less than/equal to
No
False
Yes
Strength of R change for

predictor variables
interpreted correctly?
No
False
Yes
Small sample, ordinal
variables, or violation of
assumption in the
relationship?
No
True
Yes
True with caution
ters II
Slide
87

individual relationships
Question: Finding about individual relationship between
independent variable and dependent variable.
Probability of t test
between predictors and DV
No
False
Yes
Direction of relationship
between predictors and DV
interpreted correctly?
No
False
Yes
assumption in the
relationship?
No
True
Yes
True with caution
ters II
Slide
88

individual relationships
Question: Finding about independent variable with largest
impact on dependent variable.
Does the stated variable

have the largest beta
coefficient (ignoring sign)
among predictors?
No
False
Yes
assumption in the
relationship?
No
True
Yes
True with caution
ters II
Slide
89

validation analysis - 1
Question: The validation analysis supports the generalizability of the
findings?
Set the random seed and randomly
split the sample into 75% training
sample and 25% validation
sample.
Probability of ANOVA test

for training sample <=
No
False
Yes
Probability of F for R
change for training sample
Yes
No
False
ters II
Slide
90

validation analysis - 2
Pattern of significance for

predictor variables in
training sample matches
pattern for full data set?
No
False
Yes
Shrinkage in R (R for
training sample - R for
validation sample) < 2%?
Yes
True
No
False

Hierarchical Multiple Regression)

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Hierarchical Multiple Regression)

Enviado por

Direitos autorais:

Formatos disponíveis

SW388R7

Data Analysis &

Hierarchical Multiple Regression

Differences between hierarchical

Differences between standard and hierarchical

Standard multiple regression is used to evaluate the relationship

Hierarchical regression is used to evaluate the relationship between a

For example, a research hypothesis might state that there are

In hierarchical regression, the independent variables are entered into

SPSS shows the statistical results (Model Summary, ANOVA,

The null hypothesis for the addition of each block of variables

If the null hypothesis is rejected, then our interpretation

A hierarchical regression can have as many blocks as there are

A more common hierarchical regression specifies two blocks of

Control variables are often demographics which are thought to

Support for a hierarchical hypothesis would be expected to

However, many times, we want to exclude the effect of blocks

The latter strategy is the one that we will employ in our

Differences in solving hierarchical regression

R change, i.e. the increase when the predictors variables are

In the interpretation of individual relationships, the

Similarly, in the validation analysis, we are only concerned with

A hierarchical regression problem

The problem asks us to examine the feasibility

Level of measurement - answer

"Spouse's highest academic degree" [spdeg] is ordinal, satisfying the

Sample size - question

The second question asks about the

The baseline regression - 1

After we check for violations of

To run the baseline

The baseline regression - 2

Second, move the

Fourth, click on the Next

Third, select the method for

The baseline regression - 3

First, move the

Second, click on the

The baseline regression - 4

Second, mark the checkboxes for Model

Third, mark the

Fourth, mark the

The baseline regression - 5

R for the baseline model

Prior to any transformations of variables

Sample size evidence and answer

Hierarchical multiple regression requires that the

Assumption of normality for the dependent

Having satisfied the level of measurement

Run the script to test normality

Second, click on the Normality option

Normality of the dependent variable:

The dependent variable "spouse's highest

The answer to the

Normality of the transformed dependent variable:

The "log of spouse's highest academic degree

Normality of the control variable: age

Next, we will evaluate the

Normality of the control variable: age

The independent variable "age" [age]

Normality of the predictor variable:

Next, we will evaluate the

Normality of the predictor variable: