Spss Tutorials

SPSS TUTORIALS
Overview
Proficiency with statistical software packages is indispensable today for serious research in the social
sciences. SPSS is one of the most widely used and powerful statistical software packages. It covers a
broad range of statistical procedures that allow one to summarize data (e.g., compute means and
standard deviations), determine whether there are significant differences between groups (e.g., t tests,
analysis of variance), examine relationships among variables (e.g., correlation, multiple regression),
and graph results (e.g., bar charts, line graphs). These tutorials show screenshots of SPSS 15, the
newest version at the time the tutorials were written. If you are using a different version of SPSS, your
screens may not look exactly like those presented in the tutorials, but the basic functionality should be
the same or very similar. These pages are based on a series of SPSS tutorials originally written by Dr.
Gil Einstein and Dr. Ken Abernethy of Furman University.
The lessons presented here give you an introduction to SPSS. They are not designed to teach you
statistics, but are intended for individuals who already have some background in statistics and want to
learn SPSS, or to be used as a supplement to a statistics course and text. When you are comfortable
with SPSS, we encourage you to explore the SPSS menus and options, because the package is very
powerful and there are usually multiple ways to accomplish your statistical goals. Unless you are
already familiar with SPSS, you should start with Lesson 1, which presents a brief overview of the
different types of windows and files available with SPSS. Lesson 2 describes how to enter and label
your data, transform data, select cases, and sort cases. Lesson 3 shows you how to generate various
descriptive statistics and some simple graphical representations of your data.
Once you understand how to enter and manipulate data and to generate and report statistical results,
you can then go on to any of the other lessons. Each of these lessons includes a research problem with
a hypothetical set of data, and step-by-step directions for how to perform the specified analyses. An
additional example for further practice is also included for many of the lessons. Lessons 4 - 9 describe
specific statistical procedures used to compare the means of two or more groups (t-tests and analysis
of variance). Lesson 10 covers correlation and Lesson 11 covers linear regression for the bivariate
case (one independent variable and one dependent variable). Lesson 12 covers multiple regression
(one dependent variable and two or more independent variables). In Lesson 13 you will learn how to
conduct and interpret chi-square analyses for categorical data arranged in one-way tables (goodnessof-fit tests) and two-way tables (tests of independence). Lesson 14 introduces analysis of covariance
(ANCOVA), a technique combining regression and analysis of variance.
An Important Point to Remember
Please note that these tutorials cover only a few of the most basic statistical procedures available with
SPSS. After you have worked through these tutorials, you will have familiarity with SPSS. With this
familiarity and an understanding of the statistical test that you wish to use, we are confident that you
will be able to figure out other procedures on your own. You may want to bookmark this site, as new
material is being added on a regular basis. Your feedback is also very welcome and appreciated. You
may provide feedback, make suggestions for additional tutorials, or report errors by clicking on the
Provide Feedback link in the navigation menu, or by e-mailing the site's webmaster.
SPSS TUTORIALS
Page 1
Compiled: Anim Ofosu
Lesson 1: SPSS Windows and Files

Objectives
1. Launch SPSS for Windows.
2. Examine SPSS windows and file types.
Overview
In a typical SPSS session, you are likely to work with two or more SPSS windows and to save the
contents of one or more windows to separate files. The window containing your data is the SPSS Data
Editor. If you plan to use the data file again, you may click on File, Save from within the Data Editor
and give the file a descriptive name. SPSS will supply the .sav extension, indicating that the saved
information is in the form of a data file. An SPSS data file includes both the data records and their
structure. The window containing the results of the SPSS procedures you have performed is the SPSS
Viewer. You may find it convenient to save this as an output file. It is okay to use the same name you
used for your data because SPSS will supply the .spo extension to indicate that the saved file is an
output file. As you run various procedures, you may also choose to show the SPSS syntax for these
commands in a syntax window, and save the syntax in a separate .sps file. It is possible to run SPSS
commands directly from syntax, though in this series of tutorials we will focus our attention on SPSS
data and output files and use the point-and-click method to enter the necessary commands.
Launching SPSS
SPSS for Windows is launched from the Windows desktop. There are several ways to access the
program, and the one you use will be based on the way your particular computer is configured. There
may be an SPSS for Windows shortcut on the desktop or in your Start menu. Or you may have to
click Start, All Programs to find the SPSS for Windows folder. In that folder, you will find the SPSS
for Windows program icon.
Once you have located it, click on the SPSS for Windows icon with the left mouse button to launch
SPSS. When you start the program, you will be given a blank dataset and a set of options for running
the SPSS tutorial, typing in data, running queries, creating queries, or opening existing data sources
(see Figure 1-1). For now, just click on Cancel to reveal the blank dataset in the Data Editor screen.
SPSS TUTORIALS
Page 2
Figure 1-1 SPSS opening screen
The SPSS Data Editor

Examine the SPSS Data Editor's Data View shown in Figure 1-2 below. You will learn in Lesson 2
how create an effective data structure within the Variable View and how to enter and manipulate data
using the Data Editor. As indicated above, if you click File, Save while in the Data Editor view, you
can save the data along with their structure as a separate file with the .sav extension. The Data Editor
provides the Data View as shown below, and also a separate Variable View. You can switch between
these views by clicking on the tabs at the bottom of the worksheet-like interface.
SPSS TUTORIALS
Page 3
Figure 1-2 SPSS Data Editor (Data View)
The SPSS Viewer

The SPSS Viewer is opened automatically to show the output when you run SPSS commands.
Assume for example that you wanted to find the average age of 20 students in a class. We will
examine the commands needed to calculate descriptive statistics in Lesson 3, but for now, simply
examine the SPSS Viewer window (see Figure 1-3). When you click File, Save in this view, you can
save the output to a file with the .spo extension.
SPSS TUTORIALS
Page 4
Figure 1-3 SPSS Viewer

Syntax Editor Window
Finally, you can view and save SPSS syntax commands from the Syntax Editor window. When you
are selecting commands, you will see a Paste button. Clicking that button pastes the syntax for the
commands you have chosen into the Syntax Editor. For example, the syntax to calculate the mean age
shown above is shown in Figure 1-4:
Figure 1-4 SPSS Syntax Editor

Though we will not address SPSS syntax except in passing in these tutorials, you should note that you
can run commands directly from the Syntax Editor and save your syntax (.sps) files for future
SPSS TUTORIALS
Page 5
reference. Unlike earlier versions of SPSS, version 15, the version illustrated in these tutorials,
automatically presents in the SPSS Viewer the syntax version of the commands you give it when you
point and click in the Data Editor or the SPSS Viewer (examine Figure 1-3 for an example).
Now that you know the kinds of windows and files involved in an SPSS session, you are ready to
learn how to enter, structure, and manipulate data. Those are the subjects of Lesson 2.
SPSS TUTORIALS
Page 6
Lesson 2: Entering and Working with Data

Objectives
1.
2.
3.
4.
5.
Create a data file and data structure.

Compute a new variable.
Select cases.
Sort cases.
Split a file.
Overview
Data can be entered directly into the SPSS Data Editor or imported from a variety of file types. It is
always important to check data entries carefully and ensure that the data are accurate. In this lesson
you will learn how to build an SPSS data file from scratch, how to calculate a new variable, how to
select and sort cases, and how to split a file into separate layers.
Creating a Data File
A common first step in working with SPSS is to create or open a data file. We will assume in this
lesson that you will type data directly into the SPSS Data Editor to create a new data file. You should
realize that you can also read data from many other programs, or copy and paste data from worksheets
and tables to create new data files.
Launch SPSS. You will be given various options, as we discussed in Lesson 1. Select Type in Data
or Cancel . You should now see a screen similar to the following, which is a blank dataset in the Data
View of the SPSS Data Editor (see Figure 2-1):
Figure 2-1 SPSS Data Editor - Data View

Key Point: One Row Per Participant, One Column per Variable
It is important to note that each row in the SPSS data table should be assigned to a single participant,
subject, or case, and that no case's data should appear on different rows. When there are multiple
measures for a case, each measure should appear in a separate column (called a "variable" by SPSS).
If you use a coding variable to indicate which group or condition was assigned to a case, that variable
SPSS TUTORIALS
Page 7
should also appear in a separate column. So if you were looking at the scores for five quizzes for each
of 20 students, the data for each student would occupy a single row (line) in the data table, and the
score for each quiz would occupy a separate column.
Although SPSS automatically numbers the rows of the data table, it is a very good habit to provide a
separate participant (or subject) number column so that records can be easily sorted, filtered, or
selected. Best practice also requires setting up the data structure for the data. For this purpose, we will
switch to the Variable View of the Data Editor by clicking on the Variable View tab at the bottom of
the Data Editor window. See Figure 2-2.
Figure 2-2 SPSS Data Editor - Variable View
SPSS TUTORIALS
Page 8
Example Data
Let us establish the data structure for our example of five quizzes and 20 students. We will assume
that we also know the age and the sex of each student. Although we could enter "F" for female and
"M" for male, most statistical procedures are easier to perform if a number is used to code such
categorical variables. Let us assign the number "1" to females and the number "0" to males. The
hypothetical data are shown below:
Student
Sex
Age
Quiz1
Quiz2
Quiz3
Quiz4
Quiz5
18
83
87
81
80
69
19
76
89
61
85
75
17
85
86
65
64
81
20
92
73
76
88
64
23
82
75
96
87
78
18
88
73
76
91
81
21
89
71
61
70
75
20
89
70
87
76
88
23
92
85
95
89
62
10
21
86
83
77
64
63
11
23
90
71
91
86
87
12
18
84
71
67
62
70
13
21
83
80
89
60
60
14
17
79
77
82
63
74
15
19
89
80
64
94
78
16
20
76
85
65
92
82
17
19
92
76
76
74
91
18
22
75
90
78
70
76
19
22
87
87
63
73
64
20
20
75
74
63
91
87
SPSS TUTORIALS
Page 9
Specifying the Data Structure

Switch to the Variable View by clicking on the Variable View tab (see Figure 2-2 above). The
numbers at the left of the window now refer to variables rather than participants. Note that you can
specify the variable Name, the Type of variable, the variable Width (in total characters or digits), the
number of Decimals , a descriptive Label, labels for different Values, how to deal with Missing
Values, the display Column width, how to Align the variable in the display, and whether the
Measure is nominal, ordinal, or scale (interval and ratio). In many cases you can simply accept the
defaults by leaving the entries blank. But you will definitely want to enter a variable Name and Label,
and also specify Value labels for the levels of categorical or grouping variables such as sex or the
levels of an independent variable. The variable names should be short and should not contain spaces
or special characters other than perhaps underscores. Variable labels, on the other hand, can be longer
and can contain spaces and special characters.
Let us specify the structure of our dataset by naming the variables as follows. We will also provide
information concerning the width, number of decimals, and type of measure, along with a descriptive
label:
1.
2.
3.
4.
5.
6.
7.
8.
Student
Sex
Age
Quiz1
Quiz2
Quiz3
Quiz4
Quiz5
No decimals appear in our raw data, so we will set the number of decimals to zero. After we enter the
desired information, the completed data structure might appear as follows:
Figure 2-3 SPSS data structure (Variable View)
SPSS TUTORIALS
Page
10
Notice that we provided value labels for Sex, so we won't confuse our 1's and 0's later. To do this,
click on Values in the Sex variable row and enter the appropriate labels for males and females (see
Figure 2-4).
Figure 2-4 Adding value labels

After entering the value and label for one sex, click on Add and then repeat the process for the other
sex. Click on Add after entering this information and then click OK.
Entering the Data
Now return to the data view (click on the Data View tab), and type in the data. If you prefer, you may
retrieve a copy of the data file by clicking here. Save the data file with a name that will help you
remember it. In this case, we used lesson_2.sav as the file name. Remember that SPSS will provide
the .sav extension for a data file. The data should appear as follows:
SPSS TUTORIALS
Page
11
Figure 2-5 Completed data entry

Computing a New Variable
Now we will compute a new variable by averaging the five quiz scores for each student. When we
compute this new variable, it will be added to our variable list, and a new column will be created for
it. Let us call the new variable Quiz_Avg and use SPSS's built-in function called MEAN to compute
it. Select Transform, then Compute. The Compute Variable dialog box appears. You may type in the
new variable name, specify the type and provide a label, and enter the formula for computing the new
variable. In this case, we will use the formula:
Quiz_Avg = MEAN (Quiz1, Quiz2, Quiz3, Quiz4, Quiz5)
You can enter the formula by selecting MEAN from the Functions window and then clicking on the
variable names, or you can simply type in the formula, separating the variable names by commas.
The initial Compute Variable dialog box with the target variable named Quiz_Avg and the MEAN
function selected is below. The question marks indicate that you must supply expressions for the
computation.
SPSS TUTORIALS
Page
12
Figure 2-6 Compute Variable screen

The appropriate formula is as follows:
SPSS TUTORIALS
Page
13
Figure 2-7 Completed expression
When you click OK, the new variable appears in both the data and variable views (see below). As
discussed earlier, you can change the number of decimals (numerical variables default to two
decimals) and add a descriptive label for the new variable.
SPSS TUTORIALS
Page
14
Figure 2-8 New variable appears in Data View
SPSS TUTORIALS
Page
15
Figure 2-9 New variable appears in Variable View

Selecting Cases
You may want to select only certain cases, such as the data for females or for individuals with ages
lower than 20 years. SPSS allows you to select cases either by filtering (which keeps all the cases but
limits further analyses to the selected cases) or by removing the cases that do not meet your criteria.
Usually, you will want to filter cases, but sometimes, you may want to create separate files for
additional analyses by deleting records that do not match your selection criteria. We will select
records for females and filter those records so that the records for males remain but will be excluded
from analyses until we select them again.
From either the variable view or the data view, click on Data, and then click on Select Cases. The
resulting dialog box allows you to select the desired cases for further analysis, or to re-select all cases
if data were previously filtered. Let us choose "If condition is satisfied," and specify that we want to
select only records for which the sex of the participant is female. See the dialog box in the following
figure.
SPSS TUTORIALS
Page
16
Figure 2-10 Select Cases dialog

Click the "If..." button and enter the condition for selection. In this case we will enter the expression
Sex = 1. You can type this in directly, or you can point and click to the entries in the dialog box
Figure 2-11 Select Cases expression

Click Continue, then Click OK, and then examine the data view (see Figure 2-12). Records for males
will now have a diagonal line through the row number label, indicating that though still present, these
records are excluded from further analyses.
SPSS TUTORIALS
Page
17
Figure 2-12 Selected and filtered data

Also notice that a new variable called Filter_$ has been automatically added to your data file. If you
return to the Data menu and select all the cases again, you can use this filter variable to select females
instead of having to re-enter the selection formula. If you do not want to keep this new variable, you
can right-click on its column label and select Clear.
SPSS TUTORIALS
Page
18
Figure 2-13 Filter variable added by SPSS

Sorting Cases
Next you will learn to sort cases. Let's return to the Data, Select Cases menu and choose "Select all
cases" in order to re-select the records for males.
We can sort on one or more variables, For example, we may want to sort the records in our dataset by
age and sex. Select Data, Sort Cases:
SPSS TUTORIALS
Page
19
Figure 2-14 Sort Cases option
Move Sex and Age to the "Sort by" window (see Figure 2-15) and then click OK.
Figure 2-15 Sort Cases dialog

Return to the Data View and confirm that the data are sorted by sex and by age within sex (see Figure
2-16).
SPSS TUTORIALS
Page
20
Figure 2-16 Cases sorted by Sex and Age

Splitting a File
The last subject we will cover in this tutorial is splitting a file. Instead of filtering cases, splitting a file
creates separate "layers" for the grouping variables. For example, instead of selecting only one sex at
a time, you may want to run several analyses separately for males and females. One convenient way
to accomplish that is to split the file so that every procedure you run will be automatically conducted
and reported for the two groups separately. To split a file, select Data, Split File. The data in a goup
need to be consecutive cases in the dataset, so the records must be sorted by groups. However, if your
data are not already sorted, SPSS can do that for you at the same time the file is split (see Figure 217).
SPSS TUTORIALS
Page
21
Figure 2-17 Split File menu

Now, when you run a command, such as a table command to summarize average quiz scores, the
command will be performed for each group separately and those results will be reported in the same
output (see Figure 2-18).
Figure 2-18 Split file results in separate analysis for each group
SPSS TUTORIALS
Page
22
Lesson 3: Descriptive Statistics and Graphs

Objectives
1.
2.
3.
4.
Compute descriptive statistics.

Compare means for different groups.
Display frequency distributions and histograms.
Display boxplots.
Overview
In this lesson, you will learn how to produce various descriptive statistics, simple frequency
distribution tables, and frequency histograms. You will also learn how to explore your data and create
boxplots.
Example
Let us return to our example of 20 students and five quizzes. We would like to calculate the average
score (mean) and standard deviation for each quiz. We will also look at the mean scores for men and
women on each quiz. Open the SPSS data file you saved in Lesson 2, or click here for lesson_3.sav.
Remember that we previously calculated the average quiz score for each person and included that as a
new variable in our data file.
To calculate the means and standard deviations for age, all quizzes, and the average quiz score, select
Analyze, then Descriptive Statistics, and then Descriptives as shown in the following screenshot
(see Figure 3-1).
SPSS TUTORIALS
Page
23
Figure 3-1 Accessing the Descriptives Procedure

Move the desired variables into the variables window (see Figure 3-2) and then Click OK.
Figure 3-2 Move the desired variables into the variables window.
In the resulting dialog box, make sure you check (at a minimum) the boxes in front of Mean and Std.
deviation:
SPSS TUTORIALS
Page
24
Figure 3-3 Descriptive options

The resulting output table showing the means and standard deviations of the variables is opened in the
SPSS Viewer (see Figure 3-4).
Figure 3-4 Output from Descriptive Procedure
SPSS TUTORIALS
Page
25
Exploring Means for Different Groups

When you have two or more groups, you may want to examine the means for each group as well as
the overall mean. The SPSS Compare Means procedure provides this functionality and much more,
including various hypothesis tests. Assume that you want to compare the means of men and women
on age, the five quizzes, and the average quiz score. Select Analyze, Compare Means, Means (see
Figure 3-5):
Figure 3-5 Selecting Means Procedure

Click OK, and then in the resulting dialog box, move the variables you are interested in summarizing
into the Dependent List. At this point, do not worry whether your variables are actual "dependent
variables" or not. Move Sex to the Independent List (see Figure 3-6). Click on Options to see the
many summary statistics available. In the current case, make sure that Mean, Number of Cases, and
Standard Deviation are selected.
SPSS TUTORIALS
Page
26
Figure 3-6 Means dialog box

When you click OK, the report table appears in the SPSS Viewer with the separate means for the two
sexes along with the overall data, as shown in the following figure.
Figure 3-7 Report from Means procedure

As this lesson makes clear, there are several ways to produce summary statistics such as means and
standard deviations in SPSS. From Lesson 2 you may recall that splitting the file would allow you to
calculate the descriptive statistics separately for males and females. The way to find the procedure
that works best in a given situation is to try different ones, and always to explore the options presented
in the SPSS menus and dialog boxes. The extensive SPSS help files and tutorials are also very useful.
Frequency Distributions and Histograms
SPSS provides several different ways to explore, summarize, and present data in graphic form. For
many procedures, graphs and plots are available as output options. SPSS also has an extensive
interactive chart gallery and a chart builder that can be accessed through the Graphs menu. We will
look at only a few of these features, and the interested reader is encouraged to explore the many
additional charting and graphing features of SPSS.
One very useful feature of the Frequencies procedure in SPSS is that it can produce simple frequency
tables and histograms. You may optionally choose to have the normal curve superimposed on the
SPSS TUTORIALS
Page
27
histogram for a visual check as to how the data are distributed. Let us examine the distribution of ages
of our 20 hypothetical students. Select Analyze, Descriptive Statistics, Frequencies (see Figure 3-8).
Figure 3-8 Selecting Frequencies procedure
In the Frequencies dialog, move Age to the variables window, and then click on Charts. Select
Histograms and check the box in front of With normal curve (see Figure 3-9).
SPSS TUTORIALS
Page
28
Figure 3-9 Frequencies: Charts dialog

Click Continue and OK. In the resulting output, SPSS displays the simple frequency table for age
and the frequency histogram with the normal curve (see Figures 3-10 and 3-11).
Figure 3-10 Simple frequency table
SPSS TUTORIALS
Page
29
Figure 3-11 Frequency histogram with normal curve
Exploratory Data Analysis

In addition to the standard descriptive statistics and frequency distributions and graphs, SPSS also
provides many graphical and semi-graphical techniques collectively referred to as exploratory data
analysis (EDA). EDA is useful for describing the characteristics of a dataset, identifying outliers, and
providing summary descriptions. Some of the most widely-used EDA techniques are boxplots and
stem-and-leaf displays. You can access these techniques through the commands found through
Analyze, Descriptive Statistics, Exlpore. As with the Compare Means procedure, groups can be
separated if desired. For example, a side-by-side boxplot comparing the average quiz grades of men
and women is shown in Figure 3-12.
SPSS TUTORIALS
Page
30
Figure 3-12 Boxplots
SPSS TUTORIALS
Page
31
Lesson 4: Independent-Samples t Test

Objectives
1. Conduct an independent-samples t test.
2. Interpret the output of the t test.
Overview
The independent-samples or between-groups t test is used to examine the effects of one independent
variable on one dependent variable and is restricted to comparisons of two conditions or groups (two
levels of the independent variable). In this lesson, we will describe how to analyze the results of a
between-groups design. Lesson 5 covers the paired-samples or within-subjects t test. The reader
should note that SPSS incorrectly labels this test a "T test" rather than a t test, but is inconsistent in
that labelling, as some of the SPSS output also refers to t-test results .
A between-groups design is one in which participants have been randomly assigned to the two levels
of the independent variable. In this design, each participant is assigned to only one group, and
consequently, the two groups are independent of one another. For example, suppose that you are
interested in studying the effects caffeine consumption on task performance. If you randomly assign
some participants to the caffeine group and other participants to the no-caffeine group, then you are
using a between-groups design. In a within-subjects design, by contrast, all participants would be
tested once with caffeine and once without caffeine.
An Example: Parental Involvement Experiment
Assume that you studied the effects of parental involvement (independent variable) on students'
grades (dependent variable). Half of the students in a third grade class were randomly assigned to the
parental involvement group. The teacher contacted the parents of these children throughout the year
and told them about the educational objectives of the class. Further, the teacher gave the parents
specific methods for encouraging their children's educational activities. The other half of the students
in the class were assigned to the no-parental involvement group. The scores on the first test were
tabulated for all of the children, and these are presented below:
Student
Involve
Test1
Student
Involve
Test1
78.6
81.0
64.9
10
69.5
100.0
11
73.8
83.7
12
66.7
94.0
13
54.8
78.2
14
69.3
76.9
15
73.5
82.0
16
79.4
SPSS TUTORIALS
Page
32
Creating Your Data File: Key Point

When creating a data file for an independent-samples t test in SPSS, you must also create a separate
column for the grouping variable that shows to which condition or group a particular participant
belongs. In this case, that is the parental involvement condition, so you should create a numeric code
that allows SPSS to identify the parental involvement condition for that particular score. If this
concept is difficult to grasp, you may want to revisit Lesson 2, in which a grouping variable is created
for male and female students.
So, the variable view of your SPSS data file should look like the one below, with three variables--one
for student number, one for parental involvement condition (using for example a code of "1" for
involvement and "0" for no involvement), and one column for the score on Test 1. When creating the
data file, is is a good idea to create a variable Label for each variable and Value label for the
grouping variable(s). These labels make it easier to interpret the output of your statistical procedures.
The variable view of the data file might look similar to the one below.
Figure 4-1 Variable View

The data view of the file should look like the following:
SPSS TUTORIALS
Page
33
Figure 4-2 Data View

Note that in this particular case the two groups are separated in the data file, with the first half of the
data corresponding to the parental involvement condition and the second half corresponding to the noinvolvement condition. Although this makes for an orderly data table, such ordering is NOT required
in SPSS for the independent-samples t test. When performing the test, whether or not the data are
sorted by the independent variable, you must specify which condition a participant is in by use of a
grouping variable as indicated above.
Performing the t test for the Parental Involvement Experiment
You should enter the data as described above. Or you may access the SPSS data file for the parental
involvement experiment by clicking here. To perform the t test, complete the following steps in order.
SPSS TUTORIALS
Page
34
Click on Analyze, then Compare Means, then Independent Samples T Test.
Figure 4-3 Select Analyze, Compare Means, Independent-Samples T Test

Now, move the dependent variable (in this case, labelled "Score on Test 1 [Test 1] ") into the Test
Variable window. Then move your independent variable (in this case, "Parental Involvement
[Involve]") into the Grouping Variable window. Remember that Grouping Variable stands for the
levels of the independent variable.
SPSS TUTORIALS
Page
35
Figure 4-4 Independent-Samples T Test dialog box

You will notice that there are question marks in the parentheses following your independent variable
in the Grouping Variable field. This is because you need to define the particular groups that you
want to compare. To do so, click on Define Groups, and indicate the numeric values that each group
represents. In this case, you will want to put a "0" in the field labelled Group 1 and a "1" in the field
labelled Group 2. Once you have done this, click on Continue.
Now click on OK to run the t test. You may also want to click on Paste in order to save the SPSS
syntax of what you have done (see Figure 4-5) in case you desire to run the same kind of test from
SPSS syntax.
Figure 4-5 Syntax for the independent-samples t test
SPSS TUTORIALS
Page
36
Output from the t test Procedure

As you can see below, the output from an independent-samples t test procedure is relatively
straightforward.
Figure 4-6 Independent-samples t test output

Interpreting the Output
In the SPSS output, the first table lists the number of participants (N), mean, standard deviation, and
standard error of the mean for both of your groups. Notice that the value labels are printed as well as
the variable labels for your variables, making it easier to interpret the output.
The second table (see Figure 4-6) presents you with an F test (Levene's test for equality of variances)
that evaluates the basic assumption of the t test that the variances of the two groups are approximately
equal (homogeneity of variance or homoscedasticity). If the F value reported here is very high and the
significance level is very low--usually lower than .05 or .01, then the assumption of homogeneity of
variance has been violated. In that case, you should use the t test in the lower half of the table,
whereas if you have not violated the homogeneity assumption, you should use the t test in the upper
half of the table. The t-test formula for unequal variances makes an adjustment to the degrees of
freedom, so this value is often fractional, as seen above.
In this particular case, you can see that we have not violated the homogeneity assumption, and we
should report the value of t as 2.356, degrees of freedom of 14, and the significance level of .034.
Thus, our data show that parental involvement has a significant effect on grades, t(14) = 2.356, p =
.034.
SPSS TUTORIALS
Page
37
Lesson 5: Paired-Samples t Test

Objectives
1. Conduct a paired-samples t test.
2. Interpret the output of the paired-samples t test.
Overview
The paired-samples or dependent t test is used for within-subjects or matched-pairs designs in which
observations in the groups are linked. The linkage could be based on repeated measures, natural
pairings such as mothers and daughters, or pairings created by the experimenter. In any of these cases,
the analysis is the same. The dependency between the two observations is taken into account, and
each set of observations serves as its own control, making this a generally more powerful test than the
independent-samples t test. Because of the dependency, the degrees of freedom for the paired-samples
t test are based on the number of pairs rather than the number of observations.
Example
Imagine that you conducted an experiment to test the the effects of the presence of others
(independent variable) on problem-solving performance (dependent variable). Assume further that
you used a within-subjects design; that is, each participant was tested alone and in the presence of
others on different days using comparable tasks. Higher scores indicate better problem-solving
performance. The data appear below:
Participant
Alone
Others
12
10
12
10
11
10
12
11
12
The following figure shows the variable view of the structure of the dataset:
SPSS TUTORIALS
Page
38
Figure 5-1 Dataset variable view

Entering Data for a Within-Subjects Design: Key Point
When you enter data for a within-subjects design, there must be a separate column for each condition.
This tells SPSS that the two data points are linked for a given participant. Unlike the independentsamples t test where a grouping variable is required, there is no additional grouping variable in the
paired-samples t test. The properly configured data are shown in the following screenshot of the SPSS
Data Editor Data View:
Figure 5-2 Dataset data view
SPSS TUTORIALS
Page
39
Performing the Paired-Samples t test Step-by-Step

The SPSS data file for this example can be found here. After you have entered or opened the dataset,
you should follow these steps in order.
Click on Analyze, Compare Means, and then Paired-Samples T test.
Figure 5-3 Select Paired-Samples T Test

In the resulting dialog box, click on the label for Alone and then press <Shift> and click on the label
for Others. Click on the arrow to move this pair of variables to the Paired Variables window.
SPSS TUTORIALS
Page
40
Figure 5-4 Identify paired variables

Interpreting the Paired-Samples t Test Output
Click OK and the following output appears in the SPSS Output Viewer Window (see Figure 5-5).
Note that the correlation between the two observations is reported along with its p level, and that the
value of t, the degrees of freedom (df), and the p level of the calculated t are reported as well.
Figure 5-5 Paired-Samples T Test output
SPSS TUTORIALS
Page
41
Lesson 6: One-Way ANOVA

Objectives
1. Conduct a one-way ANOVA.
2. Perform post hoc comparisons among means.
3. Interpret the ANOVA and post hoc comparison output.
Overview
The one-way ANOVA compares the means of three or more independent groups. Each group
represents a different level of a single independent variable. It is useful at least conceptually to think
of the one-way ANOVA as an extension of the independent-samples t test. The null hypothesis in the
ANOVA is that the several populations being sampled all have the same mean. Because the variance
is based on deviations from the mean, the "analysis of variance" can be used to test hypotheses about
means. The test statistic in the ANOVA is an F ratio, which is a ratio of two variances. When an
ANOVA leads to the conclusion that the sample means differ by more than a chance level, it is
usually instructive to perform post hoc or (a posteriori) analyses to determine which of the sample
means are different. It is also helpful to determine and report effect size when performing ANOVA.
Example Problem
In a class of 30 students, ten students each were randomly assigned to three different methods of
memorizing word lists. In the first method, the student was instructed to repeat the word silently when
it was presented. In the second method, the student was instructed to spell the word backward and
visualize the backward word and to pronounce it silently. The third method required the student to
associate each word with a strong memory. Each student saw the same 10 words flashed on a
computer screen for five seconds each. The list was repeated in random order until each word had
been presented a total of five times. A week later, students were asked to write down as many of the
words as they could recall. For each of the three groups, the number of correctly-recalled words is
shown in the following table:
Method1 Method2 Method3
SPSS TUTORIALS
Page
42
Entering the Data in SPSS

Recall our previous lessons on data entry. These 30 scores represent 30 different individuals, and each
participant's data should take up one line of the data file. The group membership should be coded as a
separate variable. The correctly-entered data would take the following form (see Figure 6-1). Note
that although we used 1, 2, and 3 to code group membership, we could just as easily have used 0, 1,
and 2.
Figure 6-1 Data for one-way ANOVA

Conducting the One-Way ANOVA
To perform the one-way ANOVA in SPSS, click on Analyze, Compare Means, One-Way ANOVA
(see Figure 6-2).
SPSS TUTORIALS
Page
43
Figure 6-2 Select Analyze, Compare Means, One-Way ANOVA

In the resulting dialog box, move Recall to the Dependent List and Method to the Factor field. Select
Post Hoc and then check the box in front of Tukey for the Tukey HSD test (see Figure 6-3), which is
one of the most frequently used post hoc procedures. Note also the many other post hoc comparison
tests available.
SPSS TUTORIALS
Page
44
Figure 6-3 One-Way ANOVA dialog with Tukey HSD test selected
The ANOVA summary table and the post hoc test results appear in the SPSS Viewer (see Figure 6-4).
Note that the overall (omnibus) F ratio is significant, indicating that the means differ by a larger
amount than would be expected by chance alone if the null hypothesis were true. The post hoc test
results indicate that the mean for Method 1 is significantly lower than the means for Methods 2 and 3,
but that the means for Methods 2 and 3 are not significantly different.
SPSS TUTORIALS
Page
45
Figure 6-4 ANOVA summary table and post hoc test results
As an aid to understanding the post hoc test results, SPSS also provides a table of homogenous
subsets (see Figure 6-5). Note that it is not strictly necessary that the sample sizes be equal in the oneway ANOVA, and when they are unequal, the Tukey HSD procedure uses the harmonic mean of the
sample sizes for post hoc comparisons.
SPSS TUTORIALS
Page
46
Figure 6-5 Table of homogeneous subsets

Missing from the ANOVA results table is any reference to effect size. A common effect size index is
eta squared, which is the between-groups sum of squares divided by the total sum of squares. As such,
this index represents the proportion of variance that can be attributed to between-group differences or
treatment effects. An alternative method of performing the one-way ANOVA provides the effect-size
index, but not the post hoc comparisons discussed earlier. To perform this alternative analysis, select
Analyze, Compare Means, Means (see Figure 6-6). Move Recall to the Dependent List and Method
to the Independent List. Under Options, select Anova Table and eta.
Figure 6-6 ANOVA procedure and effect size index available from Means procedure
SPSS TUTORIALS
Page
47
The ANOVA summary table from the Means procedure appears in Figure 6-7 below. Eta squared is
directly interpretable as an effect size index: 58 percent of the variance in recall can be explained by
the method used for remembering the word list.
Figure 6-7 ANOVA table and effect size from Means procedure
SPSS TUTORIALS
Page
48
Lesson 7: Repeated-Measures ANOVA

Objectives
1. Conduct the repeated-measures ANOVA.
2. Interpret the output.
3. Construct a profile plot.
Overview
The repeated-measures or within-subjects ANOVA is used when there are multiple measures for each
participant. It is conceptually useful to think of the repeated-measures ANOVA as an extension of the
paired-samples t test. Each set of observations for a subject or case serves as its own control, so this
test is quite powerful. In the repeated-measures ANOVA, the test of interest is the within-subjects
effect of the treatments or repeated measures.
The procedure for performing repeated-measures ANOVA in SPSS is found in the Analyze, General
Linear Model menu.
Example Data
Assume that a statistics professor is interested in the effects of taking a statistics course on
performance on an algebra test. She administers a 20-item college algebra test to ten randomly
selected statistics students at the beginning of the term, at the end of the term, and six months after the
course is finished. The hypothetical test results are as follows.
Student
Before
After
SixMo
13
15
17
12
15
14
12
17
16
19
20
20
10
15
14
10
13
15
12
11
14
15
13
10
11
16
Coding Considerations
Data coding considerations in the repeated-measures ANOVA are similar to those in the pairedsamples t test. Each participant or subject takes up a single row in the data file, and each observation
SPSS TUTORIALS
Page
49
requires a separate column. The properly coded SPSS data file with the data entered correctly should
appear as follows (see figure 7-1). You may also retrieve a copy of the data file if you like.
Figure 7-1 SPSS data file coded for repeated-measures ANOVA

Performing the Repeated-Measures ANOVA
To perform the repeated-measures ANOVA in SPSS, click on Analyze, then General Linear Model,
and then Repeated Measures. See Figure 7-2.
SPSS TUTORIALS
Page
50
Figure 7-2 Select Analyze, General Linear Model, Repeated Measures

In the resulting Repeated Measures dialog, you must specify the number of factors and the number of
levels for each factor. In this case, the single factor is the time the algebra test was taken, and there are
three levels: at the beginning of the course, immediately after the course, and six months after the
course. You can accept the default label of factor1, or change it to a more descriptive one. We will use
"Time" as the label for our factor, and specify that there are three levels (see Figure 7-3).
Figure 7-3 Specifying factor and levels
SPSS TUTORIALS
Page
51
After naming the factor and specifying the number of levels, you must add the factor and then define
it. Click on Add and then click on Define. See Figure 7-4.
Figure 7-4 Specifying within-subjects variable levels

Now you can enter the levels one at a time by clicking on a variable name and then clicking on the
right arrow adjacent to the Within-Subjects Variables field. Or you can click on Before in the left
pane of the Repeated Measures dialog, then hold down <Shift> and click on SixMo to select all three
levels at the same time, and then click on the right arrow to move all three levels to the window in one
step (see Figure 7-5).
SPSS TUTORIALS
Page
52
Figure 7-5 Within-subjects variables appropriately entered

Clicking on Options allows you to specify the calculation of descriptive statistics, effect size, and
contrasts among the means. If you like, you can also click on Plots to include a line graph of the
algebra test mean scores for the three administrations. Figure 7-6 is a screen shot of the Profile Plots
dialog. You should click on Time, then Horizontal Axis, and then click on Add. Click Continue to
return to the Repeated Measures dialog.
Figure 7-6 Profile Plots dialog
SPSS TUTORIALS
Page
53
Now click on Options and specify descriptive statistics, effect size, and contrasts (see Figure 7-7).
You must move Time to the Display Means window as well as specify a confidence level adjustment
for the main effects contrasts. A Bonferroni correction will adjust the alpha level in the post hoc
comparisons, while the default LSD (Fisher's least significant difference test) will not adjust the alpha
level. We will select the more conservative Bonferroni correction.
Figure 7-7 Specifying descriptive statistics, effect size, and mean contrasts
Click on Continue, then OK to run the repeated-measures ANOVA. The SPSS output provides
several tests. When there are multiple dependent variables, the multiviariate test is used to determine
whether there is an overall within-subjects effect for the combined depedendent variables. As there is
only one within-subject factor, we can ignore this test in the present case. Sphericity is an assumption
that the variances of the differences between the pairs of measures are equal. The insignificant test of
sphericity indicates that this assumption is not violated in the present case, and adjustments to the
degrees of freedom (and thus to the p level) are not required. The test of interest is the Test of WithinSubjects Effects. We can assume sphericity and report the F ratio as 8.149 with 2 and 18 degrees of
freedom and the p level as .003 (see Figure 7-8). Partial eta-squared has an interpretation similar to
that of eta-squared in the one-way ANOVA, and is directly interpretable as an effect-size index: about
48 percent of the within-subjects variation in algebra test performance can be explained by knowledge
of when the test was administered.
SPSS TUTORIALS
Page
54
Figure 7-8 Test of within-subjects effects

Additional insight is provided by the Bonferroni-corrected pairwise comparisons, which indicate that
the means for Before and After are significantly different, while none of the other comparisons are
signficant. The profile plot is of assistance in the visualization of these contrasts. See Figures 7-9 and
7-10. These results indicate an immediate but unsustained improvement in algebra test performance
for students taking a statistics course.
Figure 7-9 Bonferroni-corrected pairwise comparisions
SPSS TUTORIALS
Page
55
Figure 7-10 Profile plot
SPSS TUTORIALS
Page
56
Lesson 8: Two-Way ANOVA

Objectives
1. Conduct the two-way ANOVA.
2. Examine and interpret main effects and interaction effect.
3. Produce a plot of cell means.
Overview
We will introduce the two-way ANOVA with the simplest of such designs, a balanced or completelycrossed factorial design. In this case there are two independent variables (factors), each of which has
two or more levels. We can think of this design as a table in which each cell represents a single
independent group. The group represents a combination of levels of the two factors. For simplicity, let
us refer to the factors as A and B and assume that each factor has two levels and each independent
group has the same number of observations. There will be four independent groups. The design can
thus be visualized as follows:
Figure 8-1 Conceptualization of Two-Way ANOVA

The two-way ANOVA is an economical design, because it allows the assessment of the main effects
of each factor as well as their potential interaction.
Example Data and Coding Considerations
Assume that you are studying the effects of observing violent acts on subsequent aggressive behavior.
You are interested in the kind of violence observed: a violent cartoon versus a video of real-action
violence. A second factor is the amount of time one is exposed to violence: ten minutes or 30 minutes.
You randomly assign 8 children to each group. After the child watches the violent cartoon or action
video, the child plays a Tetris-like computer video game for 30 minutes. The game provides options
for either aggressing ("trashing" the other computerized player) or simply playing for points without
interfering with the other player. The program provides 100 opportunities for the player to make an
aggressive choice and records the number of times the child chooses an aggressive action when the
game provides the choice. The hypothetical data are below:
SPSS TUTORIALS
Page
57
Figure 8-2 Example Data

When coding and entering data for this two-way ANOVA, you should recognize that each of the 32
participants is a unique individual and that there are no repeated measures. Therefore, each participant
takes up a row in the data file, and the data should be coded and entered in such a way that the factors
are identified by two columns with group membership coded as a combination of the levels. For
illustrative purposes we will use 1 and 2 to represent the levels of the factors, though as you learned
earlier, you could just as easily have used 0s and 1s. The data view of the resulting SPSS data file
should appear something like this:
SPSS TUTORIALS
Page
58
Figure 8-3 SPSS data file data view for two-way ANOVA (partial data)
For ease of interpretation, the variables can be labelled and the values of each specified in the variable
view (see Figure 8-4).
Figure 8-4 Variable view with labels and values identified

If you prefer, you may retrieve a copy of the data file.
SPSS TUTORIALS
Page
59
Performing the Two-Way ANOVA

To perform the two-way ANOVA, select Analyze, General Linear Model, and then Univariate
because there is only one dependent variable (see Figure 8-5).
Figure 8-5 Select Analyze, General Linear Model, Univariate

In the resulting dialog, you should specify that Aggression is the dependent variable and that both
Time and Type are fixed factors (see Figure 8-6).
SPSS TUTORIALS
Page
60
Figure 8-6 Specifying the two-way ANOVA

This procedure will test the main effects for Time and Type as well as their possible interaction. It is
helpful to specify profile plots to examine the interaction of the two variables. For that purpose, select
Plots and then move Type to the Horizontal Axis field and Time to the Separate Lines field (see
Figure 8-7).
Figure 8-7 Specifying profile plots

When you click on Add, the Type * Time interaction is added to the Plots window, as shown in
Figure 8-8.
SPSS TUTORIALS
Page
61
Figure 8-8 Plotting an interaction term

Click Continue, then click Options. Check the boxes in front of Descriptive statistics and Estimates
of effect size (see Figure 8-9). Click Continue, then click OK to run the two-way ANOVA. The table
of interest is the Test of Between-Subjects Effects. Examination of the table reveals significant F
ratios for Time, Type and the Time * Type interaction (see Figure 8-9).
Figure 8-9 Table of between-subjects effects

As in the repeated-measures ANOVA, a partial eta-squared is calculated as a measure of effect size.
The profile plot (see Figure 8-10) shows that the interaction is ordinal: the differences in the number
of aggressive choices made after observing the two violence conditions increase with the time of
exposure.
SPSS TUTORIALS
Page
62
Figure 8-10 Interaction plot
SPSS TUTORIALS
Page
63
Lesson 9: ANOVA for Mixed Factorial Designs

Objectives
1. Conduct a mixed-factorial ANOVA.
2. Test between-groups and within-subjects effects.
3. Construct a profile plot.
Overview
A mixed factorial design involves two or more independent variables, of which at least one is a
within-subjects (repeated measures) factor and at least one is a between-groups factor. In the simplest
case, there will be one between-groups factor and one within-subjects factor. The between-groups
factor would need to be coded in a single column as with the independent-samples t test or the oneway ANOVA, while the repeated measures variable would comprise as many columns as there are
measures as in the paired-samples t test or the repeated-measures ANOVA.
Example Data
As an example, assume that you conducted an experiment in which you were interested in the extent
to which visual distraction affects younger and older people's learning and remembering. To do this,
you obtained a group of younger adults and a separate group of older adults and had them learn under
three conditions (eyes closed, eyes open looking at a blank field, eyes open looking at a distracting
field of pictures). This is a 2 (age) x 3 (distraction condition) mixed factorial design. The scores on the
data sheet below represent the number of words recalled out of ten under each distraction condition.
Age
Closed Eyes
Simple
Distraction
Complex
Distraction
Younger
Younger
Younger
Younger
Older
Older
Older
Older
Building the SPSS Data File

Note that there are eight separate participants, so the data file will require eight rows. There will be a
column for the participants' age, which is the between-groups variable, and three columns for the
SPSS TUTORIALS
Page
64
repeated measures, which are the distraction conditions. As always it is helpful to include a column
for participant (or case) number.
The data appropriately entered in SPSS should look something like the following (see Figure 9-1).
You may optionally download a copy of the data file.
Figure 9-1 SPSS data structure for mixed factorial design

Performing the Mixed Factorial Anova
To conduct this analysis, you will use the repeated measures procedure. The initial steps are identical
to those in the within-subjects ANOVA. You must first specify repeated measures to identify the
within-subjects variable(s), and then specify the between-groups factor(s).
Select Analyze, then General Linear Model, then Repeated Measures (see Figure 9-2).
SPSS TUTORIALS
Page
65
Figure 9-2 Preparing for the Mixed Factorial Analysis

Next, you must define the within-subjects factor(s). This process should be repeated for each factor on
which there are repeated measures. In our present case, there is only one within-subject variable, the
distraction condition. SPSS will give the within-subjects variables the names factor1, factor2, and so
on, but you can provide more descriptive names if you like. In the Repeated Measures dialog box,
type in the label distraction and the number of levels, 3. See Figure 9-3. If you like, you can give this
measure (the three distraction levels) a new name by clicking in the Measure Name field. If you
choose to name this factor, the name must be unique and may not conflict with any other variable
names. If you do not name the measure, the SPSS name for the measure will default to MEASURE_1.
In the present case we will leave the measure name blank and accept the default label.
SPSS TUTORIALS
Page
66
Figure 9-3 Specifying the within-subjects factor.

We will now specify the within-subjects and between-groups variables. Click on Add and then Define
to specify which variable in the dataset is associated with each level of the within-subjects factor (see
Figure 9-4).
Figure 9-4 Defining the within-subjects variable
SPSS TUTORIALS
Page
67
Move the Closed, Simple, and Complex variables to levels 1, 2, and 3, respectively, and then move
Age to the Between-Subjects Factor(s) window (see Figure 9-5). You can optionally specify one or
more covariates for analysis of covariance.
Figure 9-5 The complete design specification for the mixed factorial ANOVA
To display a plot of the cell means, click on Plots, and then move Age to the Horizontal axis, and
distraction to Separate Lines. Next click on Add to specify the plot (see Figure 9-6) and then click
Continue.
SPSS TUTORIALS
Page
68
Figure 9-6 Specifying plot

We will use the Options menu to specify the display marginal and cell means, to compare main
effects, to display descriptive statistics, and display measures of effect size. We will select the
Bonferroni interval adjustment to control the level of Type I error. See Figure 9-7.
Figure 9-7 Repeated measures options

Select Continue to close the options dialog and then OK to run the ANOVA. The resulting SPSS
output is rather daunting, but you should focus on the between and within-subjects tests. The test of
sphericity is not significant, indicating that this assumption has not been violated. Therefore you
should use the F ratio and degrees of freedom associated with the sphericity assumption (see Figure 9-
SPSS TUTORIALS
Page
69
8). Specifically you will want to determine whether there is a main effect for age, an effect for
distraction condition, and a possible interaction of the two. The tables of interest from the SPSS
Viewer are shown in Figures 9-8 and 9-9.
Figure 9-8 Partial SPSS output

The test of within-subjects effects indicates that there is a significant effect of the distraction condition
on word memorization. The lack of an interaction between distraction and age indicates that this
effect is consistent for both younger and older participants. The test of between-subjects effects (see
Figure 9-9) indicates there is a significant effect of the age condition on word memory.
SPSS TUTORIALS
Page
70
Figure 9-9 Test of between-subjects effects

The remainder of the output assists in the interpretation of the main effects of the within-subjects
(distraction condition) and between-subjects (age condition) factors. Of particular interest is the
profile plot, which clearly displays the main effects and the absence of an interaction (see Figure 910). As discussed above, SPSS calls the within subjects variable MEASURE_1 in the plot.
SPSS TUTORIALS
Page
71
Lesson 10: Correlation and Scatterplots

Objectives
1.
2.
3.
4.
Calculate correlation coefficients.

Test the significance of correlation coefficients.
Construct a scatterplot.
Edit features of the scatterplot.
Overview
In correlational research, there is no experimental manipulation. Rather, we measure variables in their
natural state. Instead of independent and dependent variables, it is useful to think of predictors and
criteria. In bivariate (two-variable) correlation, we are assessing the degree of linear relationship
between a predictor, X, and a criterion, Y. In multiple regression, we are assessing the degree of
relationship between a linear combination of two or more predictors, X1, X2, ...Xk, and a criterion, Y.
We will address correlation in the bivariate case in Lesson 10, linear regression in the bivariate case in
Lesson 11, and multiple regression and correlation in Lesson 12.
The Pearson product moment correlation coefficient summarizes and quantifies the relationship
between two variables in a single number. This number can range from -1 representing a perfect
negative or inverse relationship to 0 representing no relationship or complete independence to +1
representing a perfect positive or direct relationship. When we calculate a correlation coefficient from
sample data, we will need to determine whether the obtained correlation is significantly different from
zero. We will also want to produce a scatterplot or scatter diagram to examine the nature of the
relationship. Sometimes the correlation is low not because of a lack of relationship, but because of a
lack of linear relationship. In such cases, examining the scatterplot will assist in determining whether
a relationship may be nonlinear.
Example Data
Suppose that you have collected questionnaire responses to five questions concerning dormitory
conditions from 10 college freshmen. (Normally you would like to have a larger sample, but the small
sample in this case is useful for illustration.) The questionnaire assesses the students' level of
satisfaction with noise, furniture, study area, safety, and privacy. Assume that you have also assessed
the students' family income level, and you would like to test the hypothesis that satisfaction with the
college living environment is related to wealth (family income).
The questionnaire contains five questions about satisfaction with the various aspects of the dormitory
"noise," "furniture," "space," "study," "safety," and "privacy." These are answered on a 5-point Likerttype scale (very dissatisfied to very satisfied), which are coded as 1 to 5. The data sheet for this study
is shown below.
Student
Income
Noise
Furniture
Study Area
Safety
Privacy
39
59
75
45
SPSS TUTORIALS
Page
72
95
115
67
48
140
10
55

The data correctly entered in SPSS would look like the following (see Figure 10-1). Remember not
only to enter the data, but to add appropriate labels in the Variable View to improve the readability of
the output. If you prefer, you can download a copy of the data file.
Figure 10-1 Data entered in SPSS

Calculating and Testing Correlation Coefficients
To calculate and test the significance of correlation coefficients, select Analyze, Correlate, Bivariate
(see Figure 10-2).
SPSS TUTORIALS
Page
73
Figure 10-2 The bivariate correlation procedure

Move the desired variables to the Variables window, as shown in Figure 10-3.
Figure 10-3 Move desired variables to the Variables window
SPSS TUTORIALS
Page
74
Under the Options menu, let us select means and standard deviations and then click Continue. The
output contains a table of descriptive statistics (see Figure 10-4) and a table of correlations and related
significance tests (see Figure 10-5).
Figure 10-4 Descriptive statistics
Figure 10-5 Correlation matrix

Note that SPSS flags significant correlations with asterisks. The correlation matrix is symmetrical, so
the above-diagonal entries are the same as the below-diagonal entries. In our survey results we note
strong negative correlations between family income and the various survey items and strong positive
correlations among the various items.
SPSS TUTORIALS
Page
75
Constructing a Scatterplot
For purposes of illustration, let us produce a scatterplot of the relationship between satisfaction with
noise level in the dormitory and family income. We see from the correlation matrix that this is a
significant negative correlation. As family income increases, satisfaction with the dormitory noise
level decreases. To build the scatterplot, select Graphs, Interactive, Scatterplot (see Figure 10-6).
Please note that there are several different ways to construct the scatterplot in SPSS, and that we are
illustrating only one here.
Figure 10-6 Constructing a scatterplot

In the resulting dialog, enter Family Income on the X-axis and Noise on the Y-axis (see Figure 10-7).
SPSS TUTORIALS
Page
76
Figure 10-7 Specifying variables for the scatterplot

The resulting scatterplot (see Figure 10-8) shows the relationship between family income and
satisfaciton with dormitory noise.
SPSS TUTORIALS
Page
77
Figure 10-8 Scatterplot

In the SPSS Viewer it is possible to edit a chart object by double-clicking on it in the SPSS Viewer. In
attition to many other options, you can change the labeling and scaling of axes, add trend lines and
other elements to the scatterplot, and change the marker types. The edited chart apears in Figure 10-9.
If you like, you can save this particular combination as a chart template to use it again in the future.
Figure 10-9 Edited scatterplot
SPSS TUTORIALS
Page
78
Lesson 11: Linear Regression

Objectives
1. Determine the regression equation.
2. Compute predicted Y values.
3. Compute and interpret residuals.
Overview
Closely related to correlation is the topic of linear regression. As you learned in Lesson 10, the
correlation coefficient is an index of linear relationship. If the correlation coefficient is significant,
that is an indication that a linear equation can be used to model the relationship between the predictor
X and the criterion Y. In this lesson you will learn how to determine the equation of the line of best fit
between the predictor and the criterion, how to compute predicted values based on that linear
equation, and how to calculate and interpret residuals.
Example Problem and Data
This spring term you are in a large introductory psychology class. You observe an apparent
relationship between the outside temperature and the number of people who skip class on a given day.
More people seem to be absent when the weather is warmer, and more seem to be present when it is
cooler outside. You randomly select 10 class periods and record the outside temperature reading 10
minutes before class time and then count the number of students in attendance that day. If you
determine that there is a significant linear relationship, you would like to impress your professor by
SPSS TUTORIALS
Page
79
predicting how many people will be present on a given day, based on the outside temperature. The
data you collect are the following:
Temp
Attendance
50
87
77
60
67
73
53
86
75
59
70
65
83
65
85
62
80
58
64
89

These pairs of data must be entered as separate variables. The data file may look something like the
following (see Figure 11-1):
Figure 11-1 Data in SPSS

If you prefer, you can download a copy of the data. As you learned in Lesson 10, you should first
determine whether there is a significant correlation between temperature and attendance. Running the
Correlation procedure (see Lesson 10 for details), you find that the correlation is -.87, and is
significant at the .01 level (see Figure 11-2).
SPSS TUTORIALS
Page
80
Figure 11-2 Significant correlation

A scatterplot is helpful in visualizing the relationship (see Figure 11-3). Clearly, there is a negative
relationship between attendance and temperature.
Figure 11-3 Scatterplot

Linear Regression
SPSS TUTORIALS
Page
81
The correlation and scatterplot indicate a strong, though by no means perfect, relationship between the
two variables. Let us now turn our attention to regression. We will "regress" the attendance (Y)on the
temperature (X). In linear regression, we are seeking the equation of a straight line that best fits the
observations. The usefulness of such a line may not be immediately apparent, but if we can model the
relationship by a straight line, we can use that line to predict a value of Y for any value of X, even
those that have not yet been observed. For example, looking at the scatterplot in Figure 11-3, what
attendance would you predict for a temperature of 60 degrees? The regression line can answer that
question. This line will have an intercept term and a slope coefficient and will be of the general form
The intercept and slope (regression) coefficient are derived in such a way that the sums of the squared
deviations of the actual data points from the line are minimized. This is called "ordinary least squares"
estimation or OLS. Note that the predicted value of Y (read "Y-hat") is a linear combination of two
constants, the intercept term and the slope term, and the value of X, so that the only thing that varies is
the value of X. Therefore, the correlation between the predicted Ys and the observed Ys will be the
same as the correlation between the observed Ys and the observed Xs. If we subtract the predicted
value of Y from the observed value of Y, the difference is called a "residual." A residual represents the
part of the Y variable that cannot be explained by the X variable. Visually, the distance between the
observed data points and the line of best fit represents the residual.
SPSS's Regression procedure allows us to determine the equation of the line of best fit, to calculate
predicted values of Y, and to calculate and interpret residuals. Optionally, you can save the predicted
values of Y and the residuals as either standard scores or raw-score equivalents.
Running the Regression Procedure
Open the data file in SPSS. Select Analyze, Regression, and then Linear (see Figure 11-4).
SPSS TUTORIALS
Page
82
Figure 11- 4 performing the Regression procedure

The Regression procedure outputs a value called "Multiple R," which will always range from 0 to 1.
In the bivariate case, Multiple R is the absolute value of the Pearson r, and is thus .87. The square of r
or of Multiple R is .752, and represents the amount of shared variance between Y and X. When we run
the regression tool, we can optionally ask for either standardized or unstandardized (raw-score)
predicted values of Y and residuals to be calculated and saved as new variables (see Figure 11-5).
SPSS TUTORIALS
Page
83
Figure 11-5 Save options in the Regression procedure

Click OK to run the Regression procedure. The output is shown in Figure 11-6. In the ANOVA table
summarizing the regression, the omnibus F test tests the hypothesis that the population Multiple R is
zero. We can safely reject that null hypothesis. Notice that dividing the regression sum of squares,
which is based on the predicted values of Y, by the total sum of squares, which is based on the
observed values of Y, produces the same value as R Square. The value of R Square thus represents the
proportion of variance in the criterion that can be explained by the predictor. The residual sum of
squares represents the variance in the criterion that remains unexplained.
SPSS TUTORIALS
Page
84
Figure 11-6 Regression procedure output

In Figure 11-7 you can see that the residuals and predicted values are now saved as new variables in
the SPSS data file.
SPSS TUTORIALS
Page
85
Figure 11-7 Saving predicted values and residuals

The regression equation for predicting attendance from the outside temperature is 133.556 - .897 x
Temp. So for a temperature of 60 degrees, you would predict the attendance to be 80 students (see
Figure 11-8 in which this is illustrated graphically). Note that this process of using a linear equation to
predict attendance from the temperature has some obvious practical limits. You would never predict
attendance higher than 100 percent, for example, and there may be a point at which the temperature
becomes so hot as to be unbearable, and the attendance could begin to rise simply because the
classroom is air-conditioned.
SPSS TUTORIALS
Page
86
Figure 11-8 Linear trend line and regression equation

To impress your professor, assume that the outside temperature on a class day is 72 degrees.
Substituting 72 for X in the regression equation, you predict that there will be 69 students in
attendance that day.
Examining Residuals
A residual is the difference between the observed and predicted values for the criterion variable (Hair,
Black, Babin, Anderson, & Tatham, 2006). Bivariate linear regression and multiple linear regression
make four key assumptions about these residuals.
1. The phenomenon (i.e., the regression model being considered) is linear, so that the
relationship between X and Y is linear.
2. The residuals have equal variances at all levels of the predicted values of Y.
3. The residuals are independent. This is another way of saying that the successive observations
of the dependent variable are uncorrelated.
4. The residuals are normally distributed with a mean of zero.
Thus it can be very instructive to examine the residuals when you perform a regression analysis. It is
helpful to examine a histogram of the standardized residuals (see Figure 11-9), which can be created
from the Plots menu. The normal curve can be superimposed for visual reference.
SPSS TUTORIALS
Page
87
Figure 11-9 Histogram of standardized residuals

These residuals appear to be approximately normally distributed. Another useful plot is the normal pp plot produced as an option in the Plots menu. This plot compares the cumulative probabilities of the
residuals to the expected frequencies if the residuals were normally distributed. Significant departures
from a straight line would indicate nonnormality in the data (see Figure 11-10). In this case the
residuals appear once again to be fairly normally distributed.
SPSS TUTORIALS
Page
88
Figure 11-10 Normal p-p plot of observed and expected cumulative probabilities of residuals
When there are significant departures from normality, homoscedasticity, and linearity, data
transformations or the introduction of polynomial terms such as quadratic or cubic values of the
original independent or dependent variables can often be of help (Edwards, 1976).
References
Edwards, A. L. (1976). An introduction to linear regression and correlation. San Francisco: Freeman.
Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., and Tatham, R. L. (2006). Multivariate data
analysis (6th ed.). Upper Saddle River, NJ: Pearson Prentice Hall.
SPSS TUTORIALS
Page
89
Lesson 12: Multiple Correlation and Regression

Objectives
1. Perform and interpret a multiple regression analysis.
2. Test the significance of the regression and the regression coefficients.
3. Examine residuals for diagnostic purposes.
Overview
Multiple regression involves one continuous criterion (dependent) variable and two or more predictors
(independent variables). The equation for a line of best fit is derived in such a way as to minimize the
sums of the squared deviations from the line. Although there are multiple predictors, there is only one
predicted Y value, and the correlation between the observed and predicted Y values is called Multiple
R. The value of Multiple R will range from zero to one. In the case of bivariate correlation, a
regression analysis will yield a value of Multiple R that is the absolute value of the Pearson product
moment correlation coefficient between X and Y, as discussed in Lesson 11. The multiple linear
regression equation will take the following general form:
Instead of using a to represent the Y intercept, it is common practice in multiple regression to call the
intercept term b0. The significance of Multiple R, and thus of the entire regression, must be tested. As
well, the significiance of the individual regression coefficients must be examined to verify that a
particular independent variable is adding significantly to the prediction.
As in simple linear regression, residual plots are helpful in diagnosing the degree to which the
linearity, normality, and homoscedasticity assumptions have been met. Various data transformations
can be attempted to accommodate situations of curvilinearity, non-normality, and heteroscedasticity.
In multiple regression we must also consider the potential impact of multicollinearity, which is the
degree of linear relationship among the predictors. When there is a high degree of collinearity in the
predictors, the regression equation will tend to be distorted, and may lead to inappropriate conclusions
regarding which predictors are statistically significant (Lind, Marchal, and Wathen, 2006). For this
reason, we will ask for collinearity diagnostics when we run our regression. As a rule of thumb, if the
variance inflation factor (VIF) for a given predictor is very high or if the absolute value of the
correlation between two predictors is greater than .70, one or more of the predictors should be
dropped from the analysis, and the regression equation should be recomputed.
Multiple regression is in actuality a general family of techniques, and the mathematical and statistical
underpinnings of multiple regression make it an extremely powerful and flexible tool. By using group
membership or treatment level qualitative coding variables as predictors, one can easily use multiple
regression in place of t tests and analyses of variance. In this tutorial we will concentrate on the
simplest kind of multiple regression, a forced or simultaneous regression in which all predictor
variables are entered into the regression equation at one time. Other approaches include stepwise
regression in which variables are entered according to their predictive ability and hierarchical
regression in which variables are entered according to theory or hypothesis. We will examine
hierarchical regression more closely in Lesson 14 on analysis of covariance.
Example Data
SPSS TUTORIALS
Page
90
The following data (see Figure 12-1) represent statistics course grades, GRE Quantitative scores, and
cumulative GPAs for 32 graduate students at a large public university in the southern U.S. (source:
data collected by the webmaster). You may click here to retrieve a copy of the entire dataset.
Figure 12-1 Statistics course grades, GREQ, and GPA (partial data)
Preparing for the Regression Analysis
We will determine whether quantitative ability (GREQ) and cumulative GPA can be used to predict
performance in the statistics course. A very useful first step is to calculate the zero-order correlations
among the predictors and the criterion. We will use the Correlate procedure for that purpose. Select
Analyze, Correlate, Bivariate (see Figure 12-2).
SPSS TUTORIALS
Page
91
Figure 12-2 Calculate intercorrelations as preparation for regression analysis

In the Options menu of the resulting dialog box, you can request descriptive statistics if you like. The
resulting intercorrelation matrix reveals that GREQ and GPA are both significantly related to the
course grade, but are not significantly related to each other. Thus our initial impression is that
collinearity will not be a problem (see Figure 12-3).
SPSS TUTORIALS
Page
92
Figure 12-3 Descriptive statistics and intercorrelations

Conducting the Regression Analysis
To conduct the regression analysis, select Analyze, Regression, Linear (see Figure 12-4).
SPSS TUTORIALS
Page
93
Figure 12-4 Selecting the Linear Regression procedure

In the Linear Regression dialog box, move Grade to the Dependent variable field and GPA and GREQ
to the Independent(s) list, as shown in Figure 12-5.
SPSS TUTORIALS
Page
94
Figure 12-5 Linear Regression dialog box

Click on the Statistics button and check the box in front of collinearity diagnostics (see Figure 12-6).
Figure 12-6 Requesting collinearity diagnostics

Select Continue and then click on Plots to request standardized residual plots and also to request
scatter diagrams. You should request a histogram and normal distribution plot of the standardized
residuals. You can also plot the standardized residuals against the standardized predicted values to
check the assumption of homoscedasticity (see Figure 12-7).
SPSS TUTORIALS
Page
95
Click OK to run the regression analysis. The results are excerpted in Figure 12-8.
Figure 12-8 Regression procedure output (excerpt)
SPSS TUTORIALS
Page
96
Interpreting the Regression Output

The significant overall regression indicates that a linear combination of GREQ and GPA predicts
grades in the statistics course. The value of R-Square is .513, and indicates that about 51 percent of
the variation in grades is accounted for by knowledge of GPA and GREQ. The significant t values for
the regression coefficients for GREQ and GPA show that each variable contributes significantly to the
prediction. Examining the unstandardized regression coefficients is not very instructive, because these
are based on raw scores and their values are influenced by the units of measurement of the predictors.
Thus, the raw-score regression coefficient for GREQ is much smaller than that for GPA because the
two variables use different scales. On the other hand, the standardized coefficients are quite
interpretable, because each shows the relative contribution to the prediction of the given variable with
the other variable held constant. These are technically standardized partial regression coefficicients.
In the present case, we can conclude that GREQ has more predictive value than GPA, though both are
significant.
The collinearity diagnostics indicate a low degree of overlap between the predictors (as we predicted).
If the two predictor variables were orthogonal (uncorrelated), the variance inflation factor (VIF) for
each would be 1. Thus we conclude that there is not a problem with collinearity in this case.
The histogram of the standardized residuals shows that the departure from normality is not too severe
(see Figure 12-9).
Figure 12-9 Histogram of standardized residuals
SPSS TUTORIALS
Page
97
The normal p-p plot indicates some departure from normality and may suggest a curvilinear
relationship between the predictors and the criterion (see Figure 12-10).
Figure 12-10 Nomal p-p plot

The plot of standardized predicted values against the standardized residuals indicates a large degree of
heteroscedasticity (see Figure 12-11). This is mostly the result of a single outlier, case 11 (Participant
118), whose GREQ and grade scores are significantly lower than those of the remainder of the group.
Eliminating that case and
SPSS TUTORIALS
Page
98
Lesson 13: Chi-Square Tests

Objectives
1. Perform and interpret a chi-square test of goodness of fit.
2. Perform and interpret a chi-square test of independence.
Overview
Chi-square tests are used to compare observed frequencies to the frequencies expected under some
hypothesis. Tests for one categorical variable are generally called goodness-of-fit tests. In this case,
there is a one-way table of observed frequencies of the levels of some categorical variable. The null
hypothesis might state that the expected frequencies are equally distributed or that they are unequal on
the basis of some theoretical or postulated distribution.
Tests for two categorical variables are usually called tests of independence or association. In this case,
there will be a two-way contingency table with one categorical variable occupying rows of the table
and the other categorical variable occupying columns of the table. In this analysis, the expected
frequencies are commonly derived on the basis of the assumption of independence. That is, if there
were no association between the row and column variables, then a cell entry would be expected to be
the product of the cell's row and column marginal totals divided by the overall sample size.
In both tests, the chi-square test statistic is calculated as the sum of the squared differences between
the observed and expected frequencies divided by the expected frequencies, according to the
following simple formula:
where O represents the observed frequency in a given cell of the table and E represents the
corresponding expected frequency under the null hypothesis.
We will illustrate both the goodness-of-fit test and the test of independence using the same dataset.
You will find the goodness of fit test for equal or unequal unexpected frequencies as an option under
Nonparametric Tests in the Analyze menu. For the chi-square test of independence, you will use the
Crosstabs procedure under the Descriptive Statistics menu in SPSS. The cross-tabulation procedure
can make use of numeric or text entries, while the Nonparametric Test procedure requires numeric
entries. For that reason, you will need to recode any text entries into numerical values for goodnessof-fit tests.
Example Data
Assume that you are interested in the effects of peer mentoring on student academic success in a
competitive private liberal arts college. A group of 30 students is randomly selected during their
freshman orientation. These students are assigned to a team of seniors who have been trained as tutors
in various academic subjects, listening skills, and team-building skills. The 30 selected students meet
in small group sessions with their peer tutors once each week during their entire freshman year, are
SPSS TUTORIALS
Page
99
encouraged to work with their small group for study sessions, and are encouraged to schedule private
sessions with their peer mentors whenever they desire. You identify an additional 30 students at
orientation as a control group. The control group members receive no formal peer mentoring. You
determine that there are no significant differences between the high school grades and SAT scores of
the two groups. At the end of four years, you compare the two groups on academic retention and
academic performance. You code mentoring as 1 = present and 0 = absent to identify the two groups.
Because GPAs differ by academic major, you generate a binary code for grades. If the student's
cumulative GPA is at the median or higher for his or her academic major, you assign a 1. Students
whose grades are below the median for their major receive a zero. If the student is no longer enrolled
(i.e., has transferred, dropped out, or flunked out), you code a zero for retention. If he or she is still
enrolled, but has not yet graduated after four years, you code a 1. If he or she has graduated, you code
a 2.
You collect the following (hypothetical) data:
Properly entered in SPSS, the data should look like the following (see Figure 13-1). For your
convenience, you may also download a copy of the dataset.
SPSS TUTORIALS
Page
100
Figure 13-1 Dataset in SPSS (partial data)

Conducting a Goodness-of-Fit Test
To determine whether the three retention outcomes are equally distributed, you can perform a
goodness-of-fit test. Because there are three possible outcomes (no longer enrolled, currently enrolled,
and graduated) and sixty total students, you would expect each outcome to be observed in 1/3 of the
cases if there were no differences in the frequencies of these outcomes. Thus the null hypothesis
would be that 20 students would not be enrolled, 20 would be currently enrolled, and 20 would have
graduated after four years. To test this hypothesis, you must use the Nonparametric Tests procedure.
To conduct the test, select Analyze, Nonparametric Tests, Chi-Square as shown in Figure 13-2.
SPSS TUTORIALS
Page
101
Figure 13-2 Selecting chi-square test for goodness of fit

In the resulting dialog box, move Retention to the Test Variable List and accept the default for equal
expected frequencies. SPSS counts and tabulates the observed frequencies and performs the chisquare test (see Figure 13-3). The degrees of freedom for the goodness-of-fit test are the number of
categories minus one. The significant chi-square shows that the freqencies are not equally distributed,
2 (2, N = 60) = 6.10, p = .047.
SPSS TUTORIALS
Page
102
Figure 13-3 Chi-square test of goodness of fit

Conducting a Chi-Square Test of Independence
If mentoring is not related to retention, you would expect mentored and non-mentored students to
have the same outcomes, so that any observed differences in frequencies would be due to chance.
That would mean that you would expect half of the students in each outcome group to come from the
mentored students, and the other half to come from the non-mentored students. To test the hypothesis
that there is an association (or non-independence) between mentoring and retention, you will conduct
a chi-square test as part of the cross-tabulation procedure. To conduct the test, select Analyze,
Descriptive Statistics, Crosstabs (see Figure 13-4).
SPSS TUTORIALS
Page
103
Figure 13-4 Preparing for the chi-square test of independence

In the Crosstabs dialog, move one variable to the row field and the other variable to the column field.
I typically place the variable with more levels in the row field to keep the output tables narrower (see
Figure 13-5), though the results of the test would be identical if you were to reverse the row and
column variables.
SPSS TUTORIALS
Page
104
Figure 13-5 Establishing row and column variables

Clustered bar charts are an excellent way to compare the frequencies visually, so we will select that
option (see Figure 13-5). Under the Statistics option, select chi-square and Phi and Cramer's V
(measures of effect size for chi-square tests). You can also click on the Cells button to display both
observed and expected cell frequencies. The format menu allows you to specify whether the rows are
arranged in ascending (the default) or descending order. Click OK to run the Crosstabs procedure and
conduct the chi-square test.
SPSS TUTORIALS
Page
105
Figure 13-6 Partial output from Crosstabs procedure

For the test of independence, the degrees of freedom are the number of rows minus one multiplied by
the number of columns minus one, or in this case 2 x 1 = 2. The Pearson Chi-Square is significant,
indicating that mentoring had an effect on retention, 2 (2, N = 60) = 14.58, p < .001. The value of
Cramer's V is .493, indicating a large effect size (Gravetter & Walnau, 2005).
The clustered bar chart provides an excellent visual representation of the chi-square test results (see
Figure 13-7).
SPSS TUTORIALS
Page
106
Figure 13-7 Clustered bar chart

Going Further
For additional practice, you can use the Nonparametric Tests and Crosstabs procedures to determine
whether grades differ between mentored and non-mentored students and whether there is an
association between grades and retention outcomes.
References
Gravetter, F. J., & Walnau, L. B. (2005). Essentials of statistics for the behavioral sciences (5th ed.).
Belmont, CA: Thomson/Wadsworth.
SPSS TUTORIALS
Page
107
Lesson 14: Analysis of Covariance

Objectives
1. Perform and interpret an analysis of covariance using the General Linear Model.
2. Perform and interpret an analysis of covariance using hierarchical regression.
Analysis of covariance (ANCOVA) is a blending of regression and analysis of variance (Roscoe,
1975). It is possible to perform ANCOVA using the General Linear Model procedure in SPSS. An
entirely equivalent analysis is also possible using hierarchical regression, so the choice is left to the
user and his or her preferences. We will illustrate both procedures in this tutorial. We will use the
simplest of cases, a single covariate, two treatments, and a single variate (dependent variable).
ANCOVA is statistically equivalent to matching experimental groups with respect to the variable or
variables being controlled (or covaried). As you recall from correlation and regression, if two
variables are correlated, one can be used to predict the other. If there is a covariate(X) that correlates
with the dependent variable (Y), then dependent variable scores can be predicted by the covariate. If
this is the case, the differences observed between the groups cannot then be attributed to the
experimental treatment(s). ANCOVA provides a mechanism for assessing the differences in
dependent variable scores after statistically controlling for the covariate. There are two obvious
advantages to this approach: (1) any variable that influences the variation in the dependent variable
can be statistically controlled, and (2) this control can reduce the amount of error variance in the
analysis.
Example Data
Assume that you are comparing performance in a statistics class taught by two different methods.
Students in one class are instructed in the classroom, while students in the second class take their class
online. Both classes are taught by the same instructor, and use the same textbook, exams, and
assignments. At the beginning of the term all students take a test of quantitative ability (pretest), and
at the end, their score on the final exam is recorded (posttest). Because the two classes are intact, it is
not possible to achieve experimental control, so this is a quasi-experimental design. Assume that you
would like to compare the scores for the two groups on the final score while controlling for initial
quantitative ability. The hypothetical data are as follows:
SPSS TUTORIALS
Page
108
Before the ANCOVA

You may retrieve the SPSS dataset if you like. As a precursor to the ANCOVA, let us perform a
between-groups t test to examine overall differences between the two groups on the final exam. You
will find or recall this test as the subject of Lesson 4, and details will not be repeated here. The result
of the t test is shown below. See Figure 14-1. Of course, as you know, if there were multiple groups
you would perform an ANOVA rather than a t test. In this case, we conclude that the second method
led to improved test scores, but must rule out the possibility that this difference is attributable to
differences in quantitative ability of the two groups. As you know by now, you could just as easily
have compared the means using the Compare Means or One-way ANOVA procedures, and the square
root of the F-ratio obtained would be the value of t.
Figure 14-1 t Test Results
SPSS TUTORIALS
Page
109
As a second precursor to the ANCOVA, let us determine the degree of correlation between
quantitative ability and exam scores. As correlation is the subject of Lesson 10, the details are omitted
here, and only the results are shown in Figure 14-2.
Figure 14-2 Correlation between pre-test and post-test scores

Knowing that there is a statistically significant correlation between pretest and posttest scores, we
would like to exercise statistical control by holding the effects of the pretest scores constant. The
resulting ANCOVA will verify whether there are any differences in the posttest scores of the two
groups after controlling for differences in ability.
Performing the ANCOVA in GLM
To perform the ANCOVA via the General Linear Model menu, select Analyze, General Linear
Model, Univariate (see Figure 14-3).
SPSS TUTORIALS
Page
110
Figure 14-3 ANCOVA via the GLM procedure

In the resulting dialog box, move Posttest to the Dependent Variable field, Method to the Fixed
Factor(s) field, and Pretest to the Covariate(s) field. See Figure 14-4.
SPSS TUTORIALS
Page
111
Figure 14-4 Univariate dialog box

Under Options you may want to choose descriptive statistics and effect size indexes, as well as plots
of estimated marginal means for Method. As there are just two groups, main effect comparisons are
not appropriate. Examine Figure 14-5.
Figure 14-5 Univariate options for ANCOVA
SPSS TUTORIALS
Page
112
Click Continue. If you like, you can click on Plots to add profile plots for the estimated marginal
means of the posttest scores of the two groups after adjusting for pretest scores. Click on OK to run
the analysis. The results are shown in Figure 14-6. The results indicate that after controlling for initial
quantitative ability, the differences in posttest scores are statistically significantly different between
the two groups, F(1,27)=16.64, p < .001, partial eta-squared = .381.
Figure 14-6 ANCOVA results

The profile plot makes it clear that the online class had higher exam scores after controlling for initial
quantitative ability (see Figure 14-7).
SPSS TUTORIALS
Page
113

Performing an ANCOVA Using Hierarchical Regression
To perform the same ANCOVA using hierarchical regression, enter the posttest as the criterion. Then
enter the covariate (pretest) as one independent variable block and group membership (method) as a
second block. Examine the change in R-Square as the two models are compared, and the significance
of the change. The F value produced by this analysis is identical to that produced via the GLM
approach.
Select Analyze, Regression, Linear (see Figure 14-8).
SPSS TUTORIALS
Page
114
Figure 14-8 ANCOVA via hierarchical regression

Now enter Posttest as the Dependent Variable and Pretest as an Independent variable (see Figure 149).
SPSS TUTORIALS
Page
115
Figure 14-9 Linear regression dialog box

Click on the Next button and enter Method as an Independent variable, as shown in Figure 14-10.
Figure 14-10 Entering second block
SPSS TUTORIALS
Page
116
Click on Statistics, and check the box in front of R squared change (see Figure 14-11).
Figure 14-11 Specify R squared change

Click Continue then OK to run the hierarchical regression. Note in the partial output shown in Figure
14-12 that the value of F for the R Square Change with pretest held constant is identical to that
calculated earlier.
Figure 14-2 Hierarchical regression yields results identical to GLM

References
Roscoe, J. T. (1975). Fundamental research statistics for the behavioural sciences (2nd ed.). New
York: Hot, Rinehart and Winston, Inc.
SPSS TUTORIALS
Page
117

Spss Tutorials

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Spss Tutorials

Enviado por

Direitos autorais:

Formatos disponíveis

SPSS TUTORIALS

Compiled: Anim Ofosu

Lesson 1: SPSS Windows and Files

Compiled: Anim Ofosu

Figure 1-1 SPSS opening screen

The SPSS Data Editor

Compiled: Anim Ofosu

Figure 1-2 SPSS Data Editor (Data View)

The SPSS Viewer

Compiled: Anim Ofosu

Figure 1-3 SPSS Viewer

Figure 1-4 SPSS Syntax Editor

Compiled: Anim Ofosu

Compiled: Anim Ofosu

Lesson 2: Entering and Working with Data

Create a data file and data structure.

Figure 2-1 SPSS Data Editor - Data View

Compiled: Anim Ofosu

Figure 2-2 SPSS Data Editor - Variable View

Compiled: Anim Ofosu

Compiled: Anim Ofosu

Specifying the Data Structure

Figure 2-3 SPSS data structure (Variable View)

Compiled: Anim Ofosu

Figure 2-4 Adding value labels

Compiled: Anim Ofosu

Figure 2-5 Completed data entry

Compiled: Anim Ofosu

Figure 2-6 Compute Variable screen

Compiled: Anim Ofosu

Figure 2-7 Completed expression

Compiled: Anim Ofosu

Figure 2-8 New variable appears in Data View

Compiled: Anim Ofosu

Figure 2-9 New variable appears in Variable View

Compiled: Anim Ofosu

Figure 2-10 Select Cases dialog

Figure 2-11 Select Cases expression

Compiled: Anim Ofosu

Figure 2-12 Selected and filtered data

Compiled: Anim Ofosu

Figure 2-13 Filter variable added by SPSS

Compiled: Anim Ofosu

Figure 2-14 Sort Cases option

Figure 2-15 Sort Cases dialog

Compiled: Anim Ofosu

Figure 2-16 Cases sorted by Sex and Age

Compiled: Anim Ofosu

Figure 2-17 Split File menu

Compiled: Anim Ofosu

Lesson 3: Descriptive Statistics and Graphs

Compute descriptive statistics.

Compiled: Anim Ofosu

Figure 3-1 Accessing the Descriptives Procedure

Compiled: Anim Ofosu

Figure 3-3 Descriptive options

Figure 3-4 Output from Descriptive Procedure

Compiled: Anim Ofosu

Exploring Means for Different Groups

Figure 3-5 Selecting Means Procedure

Compiled: Anim Ofosu

Figure 3-6 Means dialog box

Figure 3-7 Report from Means procedure

Compiled: Anim Ofosu