Você está na página 1de 117

SPSS TUTORIALS

Overview
Proficiency with statistical software packages is indispensable today for serious research in the social
sciences. SPSS is one of the most widely used and powerful statistical software packages. It covers a
broad range of statistical procedures that allow one to summarize data (e.g., compute means and
standard deviations), determine whether there are significant differences between groups (e.g., t tests,
analysis of variance), examine relationships among variables (e.g., correlation, multiple regression),
and graph results (e.g., bar charts, line graphs). These tutorials show screenshots of SPSS 15, the
newest version at the time the tutorials were written. If you are using a different version of SPSS, your
screens may not look exactly like those presented in the tutorials, but the basic functionality should be
the same or very similar. These pages are based on a series of SPSS tutorials originally written by Dr.
Gil Einstein and Dr. Ken Abernethy of Furman University.
The lessons presented here give you an introduction to SPSS. They are not designed to teach you
statistics, but are intended for individuals who already have some background in statistics and want to
learn SPSS, or to be used as a supplement to a statistics course and text. When you are comfortable
with SPSS, we encourage you to explore the SPSS menus and options, because the package is very
powerful and there are usually multiple ways to accomplish your statistical goals. Unless you are
already familiar with SPSS, you should start with Lesson 1, which presents a brief overview of the
different types of windows and files available with SPSS. Lesson 2 describes how to enter and label
your data, transform data, select cases, and sort cases. Lesson 3 shows you how to generate various
descriptive statistics and some simple graphical representations of your data.
Once you understand how to enter and manipulate data and to generate and report statistical results,
you can then go on to any of the other lessons. Each of these lessons includes a research problem with
a hypothetical set of data, and step-by-step directions for how to perform the specified analyses. An
additional example for further practice is also included for many of the lessons. Lessons 4 - 9 describe
specific statistical procedures used to compare the means of two or more groups (t-tests and analysis
of variance). Lesson 10 covers correlation and Lesson 11 covers linear regression for the bivariate
case (one independent variable and one dependent variable). Lesson 12 covers multiple regression
(one dependent variable and two or more independent variables). In Lesson 13 you will learn how to
conduct and interpret chi-square analyses for categorical data arranged in one-way tables (goodnessof-fit tests) and two-way tables (tests of independence). Lesson 14 introduces analysis of covariance
(ANCOVA), a technique combining regression and analysis of variance.
An Important Point to Remember
Please note that these tutorials cover only a few of the most basic statistical procedures available with
SPSS. After you have worked through these tutorials, you will have familiarity with SPSS. With this
familiarity and an understanding of the statistical test that you wish to use, we are confident that you
will be able to figure out other procedures on your own. You may want to bookmark this site, as new
material is being added on a regular basis. Your feedback is also very welcome and appreciated. You
may provide feedback, make suggestions for additional tutorials, or report errors by clicking on the
Provide Feedback link in the navigation menu, or by e-mailing the site's webmaster.

SPSS TUTORIALS

Page 1

Compiled: Anim Ofosu

Lesson 1: SPSS Windows and Files


Objectives
1. Launch SPSS for Windows.
2. Examine SPSS windows and file types.
Overview
In a typical SPSS session, you are likely to work with two or more SPSS windows and to save the
contents of one or more windows to separate files. The window containing your data is the SPSS Data
Editor. If you plan to use the data file again, you may click on File, Save from within the Data Editor
and give the file a descriptive name. SPSS will supply the .sav extension, indicating that the saved
information is in the form of a data file. An SPSS data file includes both the data records and their
structure. The window containing the results of the SPSS procedures you have performed is the SPSS
Viewer. You may find it convenient to save this as an output file. It is okay to use the same name you
used for your data because SPSS will supply the .spo extension to indicate that the saved file is an
output file. As you run various procedures, you may also choose to show the SPSS syntax for these
commands in a syntax window, and save the syntax in a separate .sps file. It is possible to run SPSS
commands directly from syntax, though in this series of tutorials we will focus our attention on SPSS
data and output files and use the point-and-click method to enter the necessary commands.
Launching SPSS
SPSS for Windows is launched from the Windows desktop. There are several ways to access the
program, and the one you use will be based on the way your particular computer is configured. There
may be an SPSS for Windows shortcut on the desktop or in your Start menu. Or you may have to
click Start, All Programs to find the SPSS for Windows folder. In that folder, you will find the SPSS
for Windows program icon.
Once you have located it, click on the SPSS for Windows icon with the left mouse button to launch
SPSS. When you start the program, you will be given a blank dataset and a set of options for running
the SPSS tutorial, typing in data, running queries, creating queries, or opening existing data sources
(see Figure 1-1). For now, just click on Cancel to reveal the blank dataset in the Data Editor screen.

SPSS TUTORIALS

Page 2

Compiled: Anim Ofosu

Figure 1-1 SPSS opening screen

The SPSS Data Editor


Examine the SPSS Data Editor's Data View shown in Figure 1-2 below. You will learn in Lesson 2
how create an effective data structure within the Variable View and how to enter and manipulate data
using the Data Editor. As indicated above, if you click File, Save while in the Data Editor view, you
can save the data along with their structure as a separate file with the .sav extension. The Data Editor
provides the Data View as shown below, and also a separate Variable View. You can switch between
these views by clicking on the tabs at the bottom of the worksheet-like interface.

SPSS TUTORIALS

Page 3

Compiled: Anim Ofosu

Figure 1-2 SPSS Data Editor (Data View)

The SPSS Viewer


The SPSS Viewer is opened automatically to show the output when you run SPSS commands.
Assume for example that you wanted to find the average age of 20 students in a class. We will
examine the commands needed to calculate descriptive statistics in Lesson 3, but for now, simply
examine the SPSS Viewer window (see Figure 1-3). When you click File, Save in this view, you can
save the output to a file with the .spo extension.

SPSS TUTORIALS

Page 4

Compiled: Anim Ofosu

Figure 1-3 SPSS Viewer


Syntax Editor Window
Finally, you can view and save SPSS syntax commands from the Syntax Editor window. When you
are selecting commands, you will see a Paste button. Clicking that button pastes the syntax for the
commands you have chosen into the Syntax Editor. For example, the syntax to calculate the mean age
shown above is shown in Figure 1-4:

Figure 1-4 SPSS Syntax Editor


Though we will not address SPSS syntax except in passing in these tutorials, you should note that you
can run commands directly from the Syntax Editor and save your syntax (.sps) files for future

SPSS TUTORIALS

Page 5

Compiled: Anim Ofosu

reference. Unlike earlier versions of SPSS, version 15, the version illustrated in these tutorials,
automatically presents in the SPSS Viewer the syntax version of the commands you give it when you
point and click in the Data Editor or the SPSS Viewer (examine Figure 1-3 for an example).
Now that you know the kinds of windows and files involved in an SPSS session, you are ready to
learn how to enter, structure, and manipulate data. Those are the subjects of Lesson 2.

SPSS TUTORIALS

Page 6

Compiled: Anim Ofosu

Lesson 2: Entering and Working with Data


Objectives
1.
2.
3.
4.
5.

Create a data file and data structure.


Compute a new variable.
Select cases.
Sort cases.
Split a file.

Overview
Data can be entered directly into the SPSS Data Editor or imported from a variety of file types. It is
always important to check data entries carefully and ensure that the data are accurate. In this lesson
you will learn how to build an SPSS data file from scratch, how to calculate a new variable, how to
select and sort cases, and how to split a file into separate layers.
Creating a Data File
A common first step in working with SPSS is to create or open a data file. We will assume in this
lesson that you will type data directly into the SPSS Data Editor to create a new data file. You should
realize that you can also read data from many other programs, or copy and paste data from worksheets
and tables to create new data files.
Launch SPSS. You will be given various options, as we discussed in Lesson 1. Select Type in Data
or Cancel . You should now see a screen similar to the following, which is a blank dataset in the Data
View of the SPSS Data Editor (see Figure 2-1):

Figure 2-1 SPSS Data Editor - Data View


Key Point: One Row Per Participant, One Column per Variable
It is important to note that each row in the SPSS data table should be assigned to a single participant,
subject, or case, and that no case's data should appear on different rows. When there are multiple
measures for a case, each measure should appear in a separate column (called a "variable" by SPSS).
If you use a coding variable to indicate which group or condition was assigned to a case, that variable

SPSS TUTORIALS

Page 7

Compiled: Anim Ofosu

should also appear in a separate column. So if you were looking at the scores for five quizzes for each
of 20 students, the data for each student would occupy a single row (line) in the data table, and the
score for each quiz would occupy a separate column.
Although SPSS automatically numbers the rows of the data table, it is a very good habit to provide a
separate participant (or subject) number column so that records can be easily sorted, filtered, or
selected. Best practice also requires setting up the data structure for the data. For this purpose, we will
switch to the Variable View of the Data Editor by clicking on the Variable View tab at the bottom of
the Data Editor window. See Figure 2-2.

Figure 2-2 SPSS Data Editor - Variable View

SPSS TUTORIALS

Page 8

Compiled: Anim Ofosu

Example Data
Let us establish the data structure for our example of five quizzes and 20 students. We will assume
that we also know the age and the sex of each student. Although we could enter "F" for female and
"M" for male, most statistical procedures are easier to perform if a number is used to code such
categorical variables. Let us assign the number "1" to females and the number "0" to males. The
hypothetical data are shown below:
Student

Sex

Age

Quiz1

Quiz2

Quiz3

Quiz4

Quiz5

18

83

87

81

80

69

19

76

89

61

85

75

17

85

86

65

64

81

20

92

73

76

88

64

23

82

75

96

87

78

18

88

73

76

91

81

21

89

71

61

70

75

20

89

70

87

76

88

23

92

85

95

89

62

10

21

86

83

77

64

63

11

23

90

71

91

86

87

12

18

84

71

67

62

70

13

21

83

80

89

60

60

14

17

79

77

82

63

74

15

19

89

80

64

94

78

16

20

76

85

65

92

82

17

19

92

76

76

74

91

18

22

75

90

78

70

76

19

22

87

87

63

73

64

20

20

75

74

63

91

87

SPSS TUTORIALS

Page 9

Compiled: Anim Ofosu

Specifying the Data Structure


Switch to the Variable View by clicking on the Variable View tab (see Figure 2-2 above). The
numbers at the left of the window now refer to variables rather than participants. Note that you can
specify the variable Name, the Type of variable, the variable Width (in total characters or digits), the
number of Decimals , a descriptive Label, labels for different Values, how to deal with Missing
Values, the display Column width, how to Align the variable in the display, and whether the
Measure is nominal, ordinal, or scale (interval and ratio). In many cases you can simply accept the
defaults by leaving the entries blank. But you will definitely want to enter a variable Name and Label,
and also specify Value labels for the levels of categorical or grouping variables such as sex or the
levels of an independent variable. The variable names should be short and should not contain spaces
or special characters other than perhaps underscores. Variable labels, on the other hand, can be longer
and can contain spaces and special characters.
Let us specify the structure of our dataset by naming the variables as follows. We will also provide
information concerning the width, number of decimals, and type of measure, along with a descriptive
label:
1.
2.
3.
4.
5.
6.
7.
8.

Student
Sex
Age
Quiz1
Quiz2
Quiz3
Quiz4
Quiz5

No decimals appear in our raw data, so we will set the number of decimals to zero. After we enter the
desired information, the completed data structure might appear as follows:

Figure 2-3 SPSS data structure (Variable View)

SPSS TUTORIALS

Page
10

Compiled: Anim Ofosu

Notice that we provided value labels for Sex, so we won't confuse our 1's and 0's later. To do this,
click on Values in the Sex variable row and enter the appropriate labels for males and females (see
Figure 2-4).

Figure 2-4 Adding value labels


After entering the value and label for one sex, click on Add and then repeat the process for the other
sex. Click on Add after entering this information and then click OK.
Entering the Data
Now return to the data view (click on the Data View tab), and type in the data. If you prefer, you may
retrieve a copy of the data file by clicking here. Save the data file with a name that will help you
remember it. In this case, we used lesson_2.sav as the file name. Remember that SPSS will provide
the .sav extension for a data file. The data should appear as follows:

SPSS TUTORIALS

Page
11

Compiled: Anim Ofosu

Figure 2-5 Completed data entry


Computing a New Variable
Now we will compute a new variable by averaging the five quiz scores for each student. When we
compute this new variable, it will be added to our variable list, and a new column will be created for
it. Let us call the new variable Quiz_Avg and use SPSS's built-in function called MEAN to compute
it. Select Transform, then Compute. The Compute Variable dialog box appears. You may type in the
new variable name, specify the type and provide a label, and enter the formula for computing the new
variable. In this case, we will use the formula:
Quiz_Avg = MEAN (Quiz1, Quiz2, Quiz3, Quiz4, Quiz5)
You can enter the formula by selecting MEAN from the Functions window and then clicking on the
variable names, or you can simply type in the formula, separating the variable names by commas.
The initial Compute Variable dialog box with the target variable named Quiz_Avg and the MEAN
function selected is below. The question marks indicate that you must supply expressions for the
computation.

SPSS TUTORIALS

Page
12

Compiled: Anim Ofosu

Figure 2-6 Compute Variable screen


The appropriate formula is as follows:

SPSS TUTORIALS

Page
13

Compiled: Anim Ofosu

Figure 2-7 Completed expression

When you click OK, the new variable appears in both the data and variable views (see below). As
discussed earlier, you can change the number of decimals (numerical variables default to two
decimals) and add a descriptive label for the new variable.

SPSS TUTORIALS

Page
14

Compiled: Anim Ofosu

Figure 2-8 New variable appears in Data View

SPSS TUTORIALS

Page
15

Compiled: Anim Ofosu

Figure 2-9 New variable appears in Variable View


Selecting Cases
You may want to select only certain cases, such as the data for females or for individuals with ages
lower than 20 years. SPSS allows you to select cases either by filtering (which keeps all the cases but
limits further analyses to the selected cases) or by removing the cases that do not meet your criteria.
Usually, you will want to filter cases, but sometimes, you may want to create separate files for
additional analyses by deleting records that do not match your selection criteria. We will select
records for females and filter those records so that the records for males remain but will be excluded
from analyses until we select them again.
From either the variable view or the data view, click on Data, and then click on Select Cases. The
resulting dialog box allows you to select the desired cases for further analysis, or to re-select all cases
if data were previously filtered. Let us choose "If condition is satisfied," and specify that we want to
select only records for which the sex of the participant is female. See the dialog box in the following
figure.

SPSS TUTORIALS

Page
16

Compiled: Anim Ofosu

Figure 2-10 Select Cases dialog


Click the "If..." button and enter the condition for selection. In this case we will enter the expression
Sex = 1. You can type this in directly, or you can point and click to the entries in the dialog box

Figure 2-11 Select Cases expression


Click Continue, then Click OK, and then examine the data view (see Figure 2-12). Records for males
will now have a diagonal line through the row number label, indicating that though still present, these
records are excluded from further analyses.

SPSS TUTORIALS

Page
17

Compiled: Anim Ofosu

Figure 2-12 Selected and filtered data


Also notice that a new variable called Filter_$ has been automatically added to your data file. If you
return to the Data menu and select all the cases again, you can use this filter variable to select females
instead of having to re-enter the selection formula. If you do not want to keep this new variable, you
can right-click on its column label and select Clear.

SPSS TUTORIALS

Page
18

Compiled: Anim Ofosu

Figure 2-13 Filter variable added by SPSS


Sorting Cases
Next you will learn to sort cases. Let's return to the Data, Select Cases menu and choose "Select all
cases" in order to re-select the records for males.
We can sort on one or more variables, For example, we may want to sort the records in our dataset by
age and sex. Select Data, Sort Cases:

SPSS TUTORIALS

Page
19

Compiled: Anim Ofosu

Figure 2-14 Sort Cases option

Move Sex and Age to the "Sort by" window (see Figure 2-15) and then click OK.

Figure 2-15 Sort Cases dialog


Return to the Data View and confirm that the data are sorted by sex and by age within sex (see Figure
2-16).

SPSS TUTORIALS

Page
20

Compiled: Anim Ofosu

Figure 2-16 Cases sorted by Sex and Age


Splitting a File
The last subject we will cover in this tutorial is splitting a file. Instead of filtering cases, splitting a file
creates separate "layers" for the grouping variables. For example, instead of selecting only one sex at
a time, you may want to run several analyses separately for males and females. One convenient way
to accomplish that is to split the file so that every procedure you run will be automatically conducted
and reported for the two groups separately. To split a file, select Data, Split File. The data in a goup
need to be consecutive cases in the dataset, so the records must be sorted by groups. However, if your
data are not already sorted, SPSS can do that for you at the same time the file is split (see Figure 217).

SPSS TUTORIALS

Page
21

Compiled: Anim Ofosu

Figure 2-17 Split File menu


Now, when you run a command, such as a table command to summarize average quiz scores, the
command will be performed for each group separately and those results will be reported in the same
output (see Figure 2-18).

Figure 2-18 Split file results in separate analysis for each group

SPSS TUTORIALS

Page
22

Compiled: Anim Ofosu

Lesson 3: Descriptive Statistics and Graphs


Objectives
1.
2.
3.
4.

Compute descriptive statistics.


Compare means for different groups.
Display frequency distributions and histograms.
Display boxplots.

Overview
In this lesson, you will learn how to produce various descriptive statistics, simple frequency
distribution tables, and frequency histograms. You will also learn how to explore your data and create
boxplots.
Example
Let us return to our example of 20 students and five quizzes. We would like to calculate the average
score (mean) and standard deviation for each quiz. We will also look at the mean scores for men and
women on each quiz. Open the SPSS data file you saved in Lesson 2, or click here for lesson_3.sav.
Remember that we previously calculated the average quiz score for each person and included that as a
new variable in our data file.
To calculate the means and standard deviations for age, all quizzes, and the average quiz score, select
Analyze, then Descriptive Statistics, and then Descriptives as shown in the following screenshot
(see Figure 3-1).

SPSS TUTORIALS

Page
23

Compiled: Anim Ofosu

Figure 3-1 Accessing the Descriptives Procedure


Move the desired variables into the variables window (see Figure 3-2) and then Click OK.

Figure 3-2 Move the desired variables into the variables window.
In the resulting dialog box, make sure you check (at a minimum) the boxes in front of Mean and Std.
deviation:

SPSS TUTORIALS

Page
24

Compiled: Anim Ofosu

Figure 3-3 Descriptive options


The resulting output table showing the means and standard deviations of the variables is opened in the
SPSS Viewer (see Figure 3-4).

Figure 3-4 Output from Descriptive Procedure

SPSS TUTORIALS

Page
25

Compiled: Anim Ofosu

Exploring Means for Different Groups


When you have two or more groups, you may want to examine the means for each group as well as
the overall mean. The SPSS Compare Means procedure provides this functionality and much more,
including various hypothesis tests. Assume that you want to compare the means of men and women
on age, the five quizzes, and the average quiz score. Select Analyze, Compare Means, Means (see
Figure 3-5):

Figure 3-5 Selecting Means Procedure


Click OK, and then in the resulting dialog box, move the variables you are interested in summarizing
into the Dependent List. At this point, do not worry whether your variables are actual "dependent
variables" or not. Move Sex to the Independent List (see Figure 3-6). Click on Options to see the
many summary statistics available. In the current case, make sure that Mean, Number of Cases, and
Standard Deviation are selected.

SPSS TUTORIALS

Page
26

Compiled: Anim Ofosu

Figure 3-6 Means dialog box


When you click OK, the report table appears in the SPSS Viewer with the separate means for the two
sexes along with the overall data, as shown in the following figure.

Figure 3-7 Report from Means procedure


As this lesson makes clear, there are several ways to produce summary statistics such as means and
standard deviations in SPSS. From Lesson 2 you may recall that splitting the file would allow you to
calculate the descriptive statistics separately for males and females. The way to find the procedure
that works best in a given situation is to try different ones, and always to explore the options presented
in the SPSS menus and dialog boxes. The extensive SPSS help files and tutorials are also very useful.
Frequency Distributions and Histograms
SPSS provides several different ways to explore, summarize, and present data in graphic form. For
many procedures, graphs and plots are available as output options. SPSS also has an extensive
interactive chart gallery and a chart builder that can be accessed through the Graphs menu. We will
look at only a few of these features, and the interested reader is encouraged to explore the many
additional charting and graphing features of SPSS.
One very useful feature of the Frequencies procedure in SPSS is that it can produce simple frequency
tables and histograms. You may optionally choose to have the normal curve superimposed on the

SPSS TUTORIALS

Page
27

Compiled: Anim Ofosu

histogram for a visual check as to how the data are distributed. Let us examine the distribution of ages
of our 20 hypothetical students. Select Analyze, Descriptive Statistics, Frequencies (see Figure 3-8).

Figure 3-8 Selecting Frequencies procedure

In the Frequencies dialog, move Age to the variables window, and then click on Charts. Select
Histograms and check the box in front of With normal curve (see Figure 3-9).

SPSS TUTORIALS

Page
28

Compiled: Anim Ofosu

Figure 3-9 Frequencies: Charts dialog


Click Continue and OK. In the resulting output, SPSS displays the simple frequency table for age
and the frequency histogram with the normal curve (see Figures 3-10 and 3-11).

Figure 3-10 Simple frequency table

SPSS TUTORIALS

Page
29

Compiled: Anim Ofosu

Figure 3-11 Frequency histogram with normal curve

Exploratory Data Analysis


In addition to the standard descriptive statistics and frequency distributions and graphs, SPSS also
provides many graphical and semi-graphical techniques collectively referred to as exploratory data
analysis (EDA). EDA is useful for describing the characteristics of a dataset, identifying outliers, and
providing summary descriptions. Some of the most widely-used EDA techniques are boxplots and
stem-and-leaf displays. You can access these techniques through the commands found through
Analyze, Descriptive Statistics, Exlpore. As with the Compare Means procedure, groups can be
separated if desired. For example, a side-by-side boxplot comparing the average quiz grades of men
and women is shown in Figure 3-12.

SPSS TUTORIALS

Page
30

Compiled: Anim Ofosu

Figure 3-12 Boxplots

SPSS TUTORIALS

Page
31

Compiled: Anim Ofosu

Lesson 4: Independent-Samples t Test


Objectives
1. Conduct an independent-samples t test.
2. Interpret the output of the t test.
Overview
The independent-samples or between-groups t test is used to examine the effects of one independent
variable on one dependent variable and is restricted to comparisons of two conditions or groups (two
levels of the independent variable). In this lesson, we will describe how to analyze the results of a
between-groups design. Lesson 5 covers the paired-samples or within-subjects t test. The reader
should note that SPSS incorrectly labels this test a "T test" rather than a t test, but is inconsistent in
that labelling, as some of the SPSS output also refers to t-test results .
A between-groups design is one in which participants have been randomly assigned to the two levels
of the independent variable. In this design, each participant is assigned to only one group, and
consequently, the two groups are independent of one another. For example, suppose that you are
interested in studying the effects caffeine consumption on task performance. If you randomly assign
some participants to the caffeine group and other participants to the no-caffeine group, then you are
using a between-groups design. In a within-subjects design, by contrast, all participants would be
tested once with caffeine and once without caffeine.
An Example: Parental Involvement Experiment
Assume that you studied the effects of parental involvement (independent variable) on students'
grades (dependent variable). Half of the students in a third grade class were randomly assigned to the
parental involvement group. The teacher contacted the parents of these children throughout the year
and told them about the educational objectives of the class. Further, the teacher gave the parents
specific methods for encouraging their children's educational activities. The other half of the students
in the class were assigned to the no-parental involvement group. The scores on the first test were
tabulated for all of the children, and these are presented below:
Student

Involve

Test1

Student

Involve

Test1

78.6

81.0

64.9

10

69.5

100.0

11

73.8

83.7

12

66.7

94.0

13

54.8

78.2

14

69.3

76.9

15

73.5

82.0

16

79.4

SPSS TUTORIALS

Page
32

Compiled: Anim Ofosu

Creating Your Data File: Key Point


When creating a data file for an independent-samples t test in SPSS, you must also create a separate
column for the grouping variable that shows to which condition or group a particular participant
belongs. In this case, that is the parental involvement condition, so you should create a numeric code
that allows SPSS to identify the parental involvement condition for that particular score. If this
concept is difficult to grasp, you may want to revisit Lesson 2, in which a grouping variable is created
for male and female students.
So, the variable view of your SPSS data file should look like the one below, with three variables--one
for student number, one for parental involvement condition (using for example a code of "1" for
involvement and "0" for no involvement), and one column for the score on Test 1. When creating the
data file, is is a good idea to create a variable Label for each variable and Value label for the
grouping variable(s). These labels make it easier to interpret the output of your statistical procedures.
The variable view of the data file might look similar to the one below.

Figure 4-1 Variable View


The data view of the file should look like the following:

SPSS TUTORIALS

Page
33

Compiled: Anim Ofosu

Figure 4-2 Data View


Note that in this particular case the two groups are separated in the data file, with the first half of the
data corresponding to the parental involvement condition and the second half corresponding to the noinvolvement condition. Although this makes for an orderly data table, such ordering is NOT required
in SPSS for the independent-samples t test. When performing the test, whether or not the data are
sorted by the independent variable, you must specify which condition a participant is in by use of a
grouping variable as indicated above.
Performing the t test for the Parental Involvement Experiment
You should enter the data as described above. Or you may access the SPSS data file for the parental
involvement experiment by clicking here. To perform the t test, complete the following steps in order.

SPSS TUTORIALS

Page
34

Compiled: Anim Ofosu

Click on Analyze, then Compare Means, then Independent Samples T Test.

Figure 4-3 Select Analyze, Compare Means, Independent-Samples T Test


Now, move the dependent variable (in this case, labelled "Score on Test 1 [Test 1] ") into the Test
Variable window. Then move your independent variable (in this case, "Parental Involvement
[Involve]") into the Grouping Variable window. Remember that Grouping Variable stands for the
levels of the independent variable.

SPSS TUTORIALS

Page
35

Compiled: Anim Ofosu

Figure 4-4 Independent-Samples T Test dialog box


You will notice that there are question marks in the parentheses following your independent variable
in the Grouping Variable field. This is because you need to define the particular groups that you
want to compare. To do so, click on Define Groups, and indicate the numeric values that each group
represents. In this case, you will want to put a "0" in the field labelled Group 1 and a "1" in the field
labelled Group 2. Once you have done this, click on Continue.
Now click on OK to run the t test. You may also want to click on Paste in order to save the SPSS
syntax of what you have done (see Figure 4-5) in case you desire to run the same kind of test from
SPSS syntax.

Figure 4-5 Syntax for the independent-samples t test

SPSS TUTORIALS

Page
36

Compiled: Anim Ofosu

Output from the t test Procedure


As you can see below, the output from an independent-samples t test procedure is relatively
straightforward.

Figure 4-6 Independent-samples t test output


Interpreting the Output
In the SPSS output, the first table lists the number of participants (N), mean, standard deviation, and
standard error of the mean for both of your groups. Notice that the value labels are printed as well as
the variable labels for your variables, making it easier to interpret the output.
The second table (see Figure 4-6) presents you with an F test (Levene's test for equality of variances)
that evaluates the basic assumption of the t test that the variances of the two groups are approximately
equal (homogeneity of variance or homoscedasticity). If the F value reported here is very high and the
significance level is very low--usually lower than .05 or .01, then the assumption of homogeneity of
variance has been violated. In that case, you should use the t test in the lower half of the table,
whereas if you have not violated the homogeneity assumption, you should use the t test in the upper
half of the table. The t-test formula for unequal variances makes an adjustment to the degrees of
freedom, so this value is often fractional, as seen above.
In this particular case, you can see that we have not violated the homogeneity assumption, and we
should report the value of t as 2.356, degrees of freedom of 14, and the significance level of .034.
Thus, our data show that parental involvement has a significant effect on grades, t(14) = 2.356, p =
.034.

SPSS TUTORIALS

Page
37

Compiled: Anim Ofosu

Lesson 5: Paired-Samples t Test


Objectives
1. Conduct a paired-samples t test.
2. Interpret the output of the paired-samples t test.
Overview
The paired-samples or dependent t test is used for within-subjects or matched-pairs designs in which
observations in the groups are linked. The linkage could be based on repeated measures, natural
pairings such as mothers and daughters, or pairings created by the experimenter. In any of these cases,
the analysis is the same. The dependency between the two observations is taken into account, and
each set of observations serves as its own control, making this a generally more powerful test than the
independent-samples t test. Because of the dependency, the degrees of freedom for the paired-samples
t test are based on the number of pairs rather than the number of observations.
Example
Imagine that you conducted an experiment to test the the effects of the presence of others
(independent variable) on problem-solving performance (dependent variable). Assume further that
you used a within-subjects design; that is, each participant was tested alone and in the presence of
others on different days using comparable tasks. Higher scores indicate better problem-solving
performance. The data appear below:
Participant

Alone

Others

12

10

12

10

11

10

12

11

12

The following figure shows the variable view of the structure of the dataset:

SPSS TUTORIALS

Page
38

Compiled: Anim Ofosu

Figure 5-1 Dataset variable view


Entering Data for a Within-Subjects Design: Key Point
When you enter data for a within-subjects design, there must be a separate column for each condition.
This tells SPSS that the two data points are linked for a given participant. Unlike the independentsamples t test where a grouping variable is required, there is no additional grouping variable in the
paired-samples t test. The properly configured data are shown in the following screenshot of the SPSS
Data Editor Data View:

Figure 5-2 Dataset data view

SPSS TUTORIALS

Page
39

Compiled: Anim Ofosu

Performing the Paired-Samples t test Step-by-Step


The SPSS data file for this example can be found here. After you have entered or opened the dataset,
you should follow these steps in order.
Click on Analyze, Compare Means, and then Paired-Samples T test.

Figure 5-3 Select Paired-Samples T Test


In the resulting dialog box, click on the label for Alone and then press <Shift> and click on the label
for Others. Click on the arrow to move this pair of variables to the Paired Variables window.

SPSS TUTORIALS

Page
40

Compiled: Anim Ofosu

Figure 5-4 Identify paired variables


Interpreting the Paired-Samples t Test Output
Click OK and the following output appears in the SPSS Output Viewer Window (see Figure 5-5).
Note that the correlation between the two observations is reported along with its p level, and that the
value of t, the degrees of freedom (df), and the p level of the calculated t are reported as well.

Figure 5-5 Paired-Samples T Test output

SPSS TUTORIALS

Page
41

Compiled: Anim Ofosu

Lesson 6: One-Way ANOVA


Objectives
1. Conduct a one-way ANOVA.
2. Perform post hoc comparisons among means.
3. Interpret the ANOVA and post hoc comparison output.
Overview
The one-way ANOVA compares the means of three or more independent groups. Each group
represents a different level of a single independent variable. It is useful at least conceptually to think
of the one-way ANOVA as an extension of the independent-samples t test. The null hypothesis in the
ANOVA is that the several populations being sampled all have the same mean. Because the variance
is based on deviations from the mean, the "analysis of variance" can be used to test hypotheses about
means. The test statistic in the ANOVA is an F ratio, which is a ratio of two variances. When an
ANOVA leads to the conclusion that the sample means differ by more than a chance level, it is
usually instructive to perform post hoc or (a posteriori) analyses to determine which of the sample
means are different. It is also helpful to determine and report effect size when performing ANOVA.
Example Problem
In a class of 30 students, ten students each were randomly assigned to three different methods of
memorizing word lists. In the first method, the student was instructed to repeat the word silently when
it was presented. In the second method, the student was instructed to spell the word backward and
visualize the backward word and to pronounce it silently. The third method required the student to
associate each word with a strong memory. Each student saw the same 10 words flashed on a
computer screen for five seconds each. The list was repeated in random order until each word had
been presented a total of five times. A week later, students were asked to write down as many of the
words as they could recall. For each of the three groups, the number of correctly-recalled words is
shown in the following table:
Method1 Method2 Method3

SPSS TUTORIALS

Page
42

Compiled: Anim Ofosu

Entering the Data in SPSS


Recall our previous lessons on data entry. These 30 scores represent 30 different individuals, and each
participant's data should take up one line of the data file. The group membership should be coded as a
separate variable. The correctly-entered data would take the following form (see Figure 6-1). Note
that although we used 1, 2, and 3 to code group membership, we could just as easily have used 0, 1,
and 2.

Figure 6-1 Data for one-way ANOVA


Conducting the One-Way ANOVA
To perform the one-way ANOVA in SPSS, click on Analyze, Compare Means, One-Way ANOVA
(see Figure 6-2).

SPSS TUTORIALS

Page
43

Compiled: Anim Ofosu

Figure 6-2 Select Analyze, Compare Means, One-Way ANOVA


In the resulting dialog box, move Recall to the Dependent List and Method to the Factor field. Select
Post Hoc and then check the box in front of Tukey for the Tukey HSD test (see Figure 6-3), which is
one of the most frequently used post hoc procedures. Note also the many other post hoc comparison
tests available.

SPSS TUTORIALS

Page
44

Compiled: Anim Ofosu

Figure 6-3 One-Way ANOVA dialog with Tukey HSD test selected
The ANOVA summary table and the post hoc test results appear in the SPSS Viewer (see Figure 6-4).
Note that the overall (omnibus) F ratio is significant, indicating that the means differ by a larger
amount than would be expected by chance alone if the null hypothesis were true. The post hoc test
results indicate that the mean for Method 1 is significantly lower than the means for Methods 2 and 3,
but that the means for Methods 2 and 3 are not significantly different.

SPSS TUTORIALS

Page
45

Compiled: Anim Ofosu

Figure 6-4 ANOVA summary table and post hoc test results
As an aid to understanding the post hoc test results, SPSS also provides a table of homogenous
subsets (see Figure 6-5). Note that it is not strictly necessary that the sample sizes be equal in the oneway ANOVA, and when they are unequal, the Tukey HSD procedure uses the harmonic mean of the
sample sizes for post hoc comparisons.

SPSS TUTORIALS

Page
46

Compiled: Anim Ofosu

Figure 6-5 Table of homogeneous subsets


Missing from the ANOVA results table is any reference to effect size. A common effect size index is
eta squared, which is the between-groups sum of squares divided by the total sum of squares. As such,
this index represents the proportion of variance that can be attributed to between-group differences or
treatment effects. An alternative method of performing the one-way ANOVA provides the effect-size
index, but not the post hoc comparisons discussed earlier. To perform this alternative analysis, select
Analyze, Compare Means, Means (see Figure 6-6). Move Recall to the Dependent List and Method
to the Independent List. Under Options, select Anova Table and eta.

Figure 6-6 ANOVA procedure and effect size index available from Means procedure

SPSS TUTORIALS

Page
47

Compiled: Anim Ofosu

The ANOVA summary table from the Means procedure appears in Figure 6-7 below. Eta squared is
directly interpretable as an effect size index: 58 percent of the variance in recall can be explained by
the method used for remembering the word list.

Figure 6-7 ANOVA table and effect size from Means procedure

SPSS TUTORIALS

Page
48

Compiled: Anim Ofosu

Lesson 7: Repeated-Measures ANOVA


Objectives
1. Conduct the repeated-measures ANOVA.
2. Interpret the output.
3. Construct a profile plot.
Overview
The repeated-measures or within-subjects ANOVA is used when there are multiple measures for each
participant. It is conceptually useful to think of the repeated-measures ANOVA as an extension of the
paired-samples t test. Each set of observations for a subject or case serves as its own control, so this
test is quite powerful. In the repeated-measures ANOVA, the test of interest is the within-subjects
effect of the treatments or repeated measures.
The procedure for performing repeated-measures ANOVA in SPSS is found in the Analyze, General
Linear Model menu.
Example Data
Assume that a statistics professor is interested in the effects of taking a statistics course on
performance on an algebra test. She administers a 20-item college algebra test to ten randomly
selected statistics students at the beginning of the term, at the end of the term, and six months after the
course is finished. The hypothetical test results are as follows.
Student

Before

After

SixMo

13

15

17

12

15

14

12

17

16

19

20

20

10

15

14

10

13

15

12

11

14

15

13

10

11

16

Coding Considerations
Data coding considerations in the repeated-measures ANOVA are similar to those in the pairedsamples t test. Each participant or subject takes up a single row in the data file, and each observation

SPSS TUTORIALS

Page
49

Compiled: Anim Ofosu

requires a separate column. The properly coded SPSS data file with the data entered correctly should
appear as follows (see figure 7-1). You may also retrieve a copy of the data file if you like.

Figure 7-1 SPSS data file coded for repeated-measures ANOVA


Performing the Repeated-Measures ANOVA
To perform the repeated-measures ANOVA in SPSS, click on Analyze, then General Linear Model,
and then Repeated Measures. See Figure 7-2.

SPSS TUTORIALS

Page
50

Compiled: Anim Ofosu

Figure 7-2 Select Analyze, General Linear Model, Repeated Measures


In the resulting Repeated Measures dialog, you must specify the number of factors and the number of
levels for each factor. In this case, the single factor is the time the algebra test was taken, and there are
three levels: at the beginning of the course, immediately after the course, and six months after the
course. You can accept the default label of factor1, or change it to a more descriptive one. We will use
"Time" as the label for our factor, and specify that there are three levels (see Figure 7-3).

Figure 7-3 Specifying factor and levels

SPSS TUTORIALS

Page
51

Compiled: Anim Ofosu

After naming the factor and specifying the number of levels, you must add the factor and then define
it. Click on Add and then click on Define. See Figure 7-4.

Figure 7-4 Specifying within-subjects variable levels


Now you can enter the levels one at a time by clicking on a variable name and then clicking on the
right arrow adjacent to the Within-Subjects Variables field. Or you can click on Before in the left
pane of the Repeated Measures dialog, then hold down <Shift> and click on SixMo to select all three
levels at the same time, and then click on the right arrow to move all three levels to the window in one
step (see Figure 7-5).

SPSS TUTORIALS

Page
52

Compiled: Anim Ofosu

Figure 7-5 Within-subjects variables appropriately entered


Clicking on Options allows you to specify the calculation of descriptive statistics, effect size, and
contrasts among the means. If you like, you can also click on Plots to include a line graph of the
algebra test mean scores for the three administrations. Figure 7-6 is a screen shot of the Profile Plots
dialog. You should click on Time, then Horizontal Axis, and then click on Add. Click Continue to
return to the Repeated Measures dialog.

Figure 7-6 Profile Plots dialog

SPSS TUTORIALS

Page
53

Compiled: Anim Ofosu

Now click on Options and specify descriptive statistics, effect size, and contrasts (see Figure 7-7).
You must move Time to the Display Means window as well as specify a confidence level adjustment
for the main effects contrasts. A Bonferroni correction will adjust the alpha level in the post hoc
comparisons, while the default LSD (Fisher's least significant difference test) will not adjust the alpha
level. We will select the more conservative Bonferroni correction.

Figure 7-7 Specifying descriptive statistics, effect size, and mean contrasts
Click on Continue, then OK to run the repeated-measures ANOVA. The SPSS output provides
several tests. When there are multiple dependent variables, the multiviariate test is used to determine
whether there is an overall within-subjects effect for the combined depedendent variables. As there is
only one within-subject factor, we can ignore this test in the present case. Sphericity is an assumption
that the variances of the differences between the pairs of measures are equal. The insignificant test of
sphericity indicates that this assumption is not violated in the present case, and adjustments to the
degrees of freedom (and thus to the p level) are not required. The test of interest is the Test of WithinSubjects Effects. We can assume sphericity and report the F ratio as 8.149 with 2 and 18 degrees of
freedom and the p level as .003 (see Figure 7-8). Partial eta-squared has an interpretation similar to
that of eta-squared in the one-way ANOVA, and is directly interpretable as an effect-size index: about
48 percent of the within-subjects variation in algebra test performance can be explained by knowledge
of when the test was administered.

SPSS TUTORIALS

Page
54

Compiled: Anim Ofosu

Figure 7-8 Test of within-subjects effects


Additional insight is provided by the Bonferroni-corrected pairwise comparisons, which indicate that
the means for Before and After are significantly different, while none of the other comparisons are
signficant. The profile plot is of assistance in the visualization of these contrasts. See Figures 7-9 and
7-10. These results indicate an immediate but unsustained improvement in algebra test performance
for students taking a statistics course.

Figure 7-9 Bonferroni-corrected pairwise comparisions

SPSS TUTORIALS

Page
55

Compiled: Anim Ofosu

Figure 7-10 Profile plot

SPSS TUTORIALS

Page
56

Compiled: Anim Ofosu

Lesson 8: Two-Way ANOVA


Objectives
1. Conduct the two-way ANOVA.
2. Examine and interpret main effects and interaction effect.
3. Produce a plot of cell means.
Overview
We will introduce the two-way ANOVA with the simplest of such designs, a balanced or completelycrossed factorial design. In this case there are two independent variables (factors), each of which has
two or more levels. We can think of this design as a table in which each cell represents a single
independent group. The group represents a combination of levels of the two factors. For simplicity, let
us refer to the factors as A and B and assume that each factor has two levels and each independent
group has the same number of observations. There will be four independent groups. The design can
thus be visualized as follows:

Figure 8-1 Conceptualization of Two-Way ANOVA


The two-way ANOVA is an economical design, because it allows the assessment of the main effects
of each factor as well as their potential interaction.
Example Data and Coding Considerations
Assume that you are studying the effects of observing violent acts on subsequent aggressive behavior.
You are interested in the kind of violence observed: a violent cartoon versus a video of real-action
violence. A second factor is the amount of time one is exposed to violence: ten minutes or 30 minutes.
You randomly assign 8 children to each group. After the child watches the violent cartoon or action
video, the child plays a Tetris-like computer video game for 30 minutes. The game provides options
for either aggressing ("trashing" the other computerized player) or simply playing for points without
interfering with the other player. The program provides 100 opportunities for the player to make an
aggressive choice and records the number of times the child chooses an aggressive action when the
game provides the choice. The hypothetical data are below:

SPSS TUTORIALS

Page
57

Compiled: Anim Ofosu

Figure 8-2 Example Data


When coding and entering data for this two-way ANOVA, you should recognize that each of the 32
participants is a unique individual and that there are no repeated measures. Therefore, each participant
takes up a row in the data file, and the data should be coded and entered in such a way that the factors
are identified by two columns with group membership coded as a combination of the levels. For
illustrative purposes we will use 1 and 2 to represent the levels of the factors, though as you learned
earlier, you could just as easily have used 0s and 1s. The data view of the resulting SPSS data file
should appear something like this:

SPSS TUTORIALS

Page
58

Compiled: Anim Ofosu

Figure 8-3 SPSS data file data view for two-way ANOVA (partial data)
For ease of interpretation, the variables can be labelled and the values of each specified in the variable
view (see Figure 8-4).

Figure 8-4 Variable view with labels and values identified


If you prefer, you may retrieve a copy of the data file.

SPSS TUTORIALS

Page
59

Compiled: Anim Ofosu

Performing the Two-Way ANOVA


To perform the two-way ANOVA, select Analyze, General Linear Model, and then Univariate
because there is only one dependent variable (see Figure 8-5).

Figure 8-5 Select Analyze, General Linear Model, Univariate


In the resulting dialog, you should specify that Aggression is the dependent variable and that both
Time and Type are fixed factors (see Figure 8-6).

SPSS TUTORIALS

Page
60

Compiled: Anim Ofosu

Figure 8-6 Specifying the two-way ANOVA


This procedure will test the main effects for Time and Type as well as their possible interaction. It is
helpful to specify profile plots to examine the interaction of the two variables. For that purpose, select
Plots and then move Type to the Horizontal Axis field and Time to the Separate Lines field (see
Figure 8-7).

Figure 8-7 Specifying profile plots


When you click on Add, the Type * Time interaction is added to the Plots window, as shown in
Figure 8-8.

SPSS TUTORIALS

Page
61

Compiled: Anim Ofosu

Figure 8-8 Plotting an interaction term


Click Continue, then click Options. Check the boxes in front of Descriptive statistics and Estimates
of effect size (see Figure 8-9). Click Continue, then click OK to run the two-way ANOVA. The table
of interest is the Test of Between-Subjects Effects. Examination of the table reveals significant F
ratios for Time, Type and the Time * Type interaction (see Figure 8-9).

Figure 8-9 Table of between-subjects effects


As in the repeated-measures ANOVA, a partial eta-squared is calculated as a measure of effect size.
The profile plot (see Figure 8-10) shows that the interaction is ordinal: the differences in the number
of aggressive choices made after observing the two violence conditions increase with the time of
exposure.

SPSS TUTORIALS

Page
62

Compiled: Anim Ofosu

Figure 8-10 Interaction plot

SPSS TUTORIALS

Page
63

Compiled: Anim Ofosu

Lesson 9: ANOVA for Mixed Factorial Designs


Objectives
1. Conduct a mixed-factorial ANOVA.
2. Test between-groups and within-subjects effects.
3. Construct a profile plot.
Overview
A mixed factorial design involves two or more independent variables, of which at least one is a
within-subjects (repeated measures) factor and at least one is a between-groups factor. In the simplest
case, there will be one between-groups factor and one within-subjects factor. The between-groups
factor would need to be coded in a single column as with the independent-samples t test or the oneway ANOVA, while the repeated measures variable would comprise as many columns as there are
measures as in the paired-samples t test or the repeated-measures ANOVA.
Example Data
As an example, assume that you conducted an experiment in which you were interested in the extent
to which visual distraction affects younger and older people's learning and remembering. To do this,
you obtained a group of younger adults and a separate group of older adults and had them learn under
three conditions (eyes closed, eyes open looking at a blank field, eyes open looking at a distracting
field of pictures). This is a 2 (age) x 3 (distraction condition) mixed factorial design. The scores on the
data sheet below represent the number of words recalled out of ten under each distraction condition.
Age

Closed Eyes

Simple
Distraction

Complex
Distraction

Younger

Younger

Younger

Younger

Older

Older

Older

Older

Building the SPSS Data File


Note that there are eight separate participants, so the data file will require eight rows. There will be a
column for the participants' age, which is the between-groups variable, and three columns for the

SPSS TUTORIALS

Page
64

Compiled: Anim Ofosu

repeated measures, which are the distraction conditions. As always it is helpful to include a column
for participant (or case) number.
The data appropriately entered in SPSS should look something like the following (see Figure 9-1).
You may optionally download a copy of the data file.

Figure 9-1 SPSS data structure for mixed factorial design


Performing the Mixed Factorial Anova
To conduct this analysis, you will use the repeated measures procedure. The initial steps are identical
to those in the within-subjects ANOVA. You must first specify repeated measures to identify the
within-subjects variable(s), and then specify the between-groups factor(s).
Select Analyze, then General Linear Model, then Repeated Measures (see Figure 9-2).

SPSS TUTORIALS

Page
65

Compiled: Anim Ofosu

Figure 9-2 Preparing for the Mixed Factorial Analysis


Next, you must define the within-subjects factor(s). This process should be repeated for each factor on
which there are repeated measures. In our present case, there is only one within-subject variable, the
distraction condition. SPSS will give the within-subjects variables the names factor1, factor2, and so
on, but you can provide more descriptive names if you like. In the Repeated Measures dialog box,
type in the label distraction and the number of levels, 3. See Figure 9-3. If you like, you can give this
measure (the three distraction levels) a new name by clicking in the Measure Name field. If you
choose to name this factor, the name must be unique and may not conflict with any other variable
names. If you do not name the measure, the SPSS name for the measure will default to MEASURE_1.
In the present case we will leave the measure name blank and accept the default label.

SPSS TUTORIALS

Page
66

Compiled: Anim Ofosu

Figure 9-3 Specifying the within-subjects factor.


We will now specify the within-subjects and between-groups variables. Click on Add and then Define
to specify which variable in the dataset is associated with each level of the within-subjects factor (see
Figure 9-4).

Figure 9-4 Defining the within-subjects variable

SPSS TUTORIALS

Page
67

Compiled: Anim Ofosu

Move the Closed, Simple, and Complex variables to levels 1, 2, and 3, respectively, and then move
Age to the Between-Subjects Factor(s) window (see Figure 9-5). You can optionally specify one or
more covariates for analysis of covariance.

Figure 9-5 The complete design specification for the mixed factorial ANOVA
To display a plot of the cell means, click on Plots, and then move Age to the Horizontal axis, and
distraction to Separate Lines. Next click on Add to specify the plot (see Figure 9-6) and then click
Continue.

SPSS TUTORIALS

Page
68

Compiled: Anim Ofosu

Figure 9-6 Specifying plot


We will use the Options menu to specify the display marginal and cell means, to compare main
effects, to display descriptive statistics, and display measures of effect size. We will select the
Bonferroni interval adjustment to control the level of Type I error. See Figure 9-7.

Figure 9-7 Repeated measures options


Select Continue to close the options dialog and then OK to run the ANOVA. The resulting SPSS
output is rather daunting, but you should focus on the between and within-subjects tests. The test of
sphericity is not significant, indicating that this assumption has not been violated. Therefore you
should use the F ratio and degrees of freedom associated with the sphericity assumption (see Figure 9-

SPSS TUTORIALS

Page
69

Compiled: Anim Ofosu

8). Specifically you will want to determine whether there is a main effect for age, an effect for
distraction condition, and a possible interaction of the two. The tables of interest from the SPSS
Viewer are shown in Figures 9-8 and 9-9.

Figure 9-8 Partial SPSS output


The test of within-subjects effects indicates that there is a significant effect of the distraction condition
on word memorization. The lack of an interaction between distraction and age indicates that this
effect is consistent for both younger and older participants. The test of between-subjects effects (see
Figure 9-9) indicates there is a significant effect of the age condition on word memory.

SPSS TUTORIALS

Page
70

Compiled: Anim Ofosu

Figure 9-9 Test of between-subjects effects


The remainder of the output assists in the interpretation of the main effects of the within-subjects
(distraction condition) and between-subjects (age condition) factors. Of particular interest is the
profile plot, which clearly displays the main effects and the absence of an interaction (see Figure 910). As discussed above, SPSS calls the within subjects variable MEASURE_1 in the plot.

Figure 9-10 Profile plot

SPSS TUTORIALS

Page
71

Compiled: Anim Ofosu

Lesson 10: Correlation and Scatterplots


Objectives
1.
2.
3.
4.

Calculate correlation coefficients.


Test the significance of correlation coefficients.
Construct a scatterplot.
Edit features of the scatterplot.

Overview
In correlational research, there is no experimental manipulation. Rather, we measure variables in their
natural state. Instead of independent and dependent variables, it is useful to think of predictors and
criteria. In bivariate (two-variable) correlation, we are assessing the degree of linear relationship
between a predictor, X, and a criterion, Y. In multiple regression, we are assessing the degree of
relationship between a linear combination of two or more predictors, X1, X2, ...Xk, and a criterion, Y.
We will address correlation in the bivariate case in Lesson 10, linear regression in the bivariate case in
Lesson 11, and multiple regression and correlation in Lesson 12.
The Pearson product moment correlation coefficient summarizes and quantifies the relationship
between two variables in a single number. This number can range from -1 representing a perfect
negative or inverse relationship to 0 representing no relationship or complete independence to +1
representing a perfect positive or direct relationship. When we calculate a correlation coefficient from
sample data, we will need to determine whether the obtained correlation is significantly different from
zero. We will also want to produce a scatterplot or scatter diagram to examine the nature of the
relationship. Sometimes the correlation is low not because of a lack of relationship, but because of a
lack of linear relationship. In such cases, examining the scatterplot will assist in determining whether
a relationship may be nonlinear.
Example Data
Suppose that you have collected questionnaire responses to five questions concerning dormitory
conditions from 10 college freshmen. (Normally you would like to have a larger sample, but the small
sample in this case is useful for illustration.) The questionnaire assesses the students' level of
satisfaction with noise, furniture, study area, safety, and privacy. Assume that you have also assessed
the students' family income level, and you would like to test the hypothesis that satisfaction with the
college living environment is related to wealth (family income).
The questionnaire contains five questions about satisfaction with the various aspects of the dormitory
"noise," "furniture," "space," "study," "safety," and "privacy." These are answered on a 5-point Likerttype scale (very dissatisfied to very satisfied), which are coded as 1 to 5. The data sheet for this study
is shown below.

Student

Income

Noise

Furniture

Study Area

Safety

Privacy

39

59

75

45

SPSS TUTORIALS

Page
72

Compiled: Anim Ofosu

95

115

67

48

140

10

55

Entering the Data in SPSS


The data correctly entered in SPSS would look like the following (see Figure 10-1). Remember not
only to enter the data, but to add appropriate labels in the Variable View to improve the readability of
the output. If you prefer, you can download a copy of the data file.

Figure 10-1 Data entered in SPSS


Calculating and Testing Correlation Coefficients
To calculate and test the significance of correlation coefficients, select Analyze, Correlate, Bivariate
(see Figure 10-2).

SPSS TUTORIALS

Page
73

Compiled: Anim Ofosu

Figure 10-2 The bivariate correlation procedure


Move the desired variables to the Variables window, as shown in Figure 10-3.

Figure 10-3 Move desired variables to the Variables window

SPSS TUTORIALS

Page
74

Compiled: Anim Ofosu

Under the Options menu, let us select means and standard deviations and then click Continue. The
output contains a table of descriptive statistics (see Figure 10-4) and a table of correlations and related
significance tests (see Figure 10-5).

Figure 10-4 Descriptive statistics

Figure 10-5 Correlation matrix


Note that SPSS flags significant correlations with asterisks. The correlation matrix is symmetrical, so
the above-diagonal entries are the same as the below-diagonal entries. In our survey results we note
strong negative correlations between family income and the various survey items and strong positive
correlations among the various items.

SPSS TUTORIALS

Page
75

Compiled: Anim Ofosu

Constructing a Scatterplot
For purposes of illustration, let us produce a scatterplot of the relationship between satisfaction with
noise level in the dormitory and family income. We see from the correlation matrix that this is a
significant negative correlation. As family income increases, satisfaction with the dormitory noise
level decreases. To build the scatterplot, select Graphs, Interactive, Scatterplot (see Figure 10-6).
Please note that there are several different ways to construct the scatterplot in SPSS, and that we are
illustrating only one here.

Figure 10-6 Constructing a scatterplot


In the resulting dialog, enter Family Income on the X-axis and Noise on the Y-axis (see Figure 10-7).

SPSS TUTORIALS

Page
76

Compiled: Anim Ofosu

Figure 10-7 Specifying variables for the scatterplot


The resulting scatterplot (see Figure 10-8) shows the relationship between family income and
satisfaciton with dormitory noise.

SPSS TUTORIALS

Page
77

Compiled: Anim Ofosu

Figure 10-8 Scatterplot


In the SPSS Viewer it is possible to edit a chart object by double-clicking on it in the SPSS Viewer. In
attition to many other options, you can change the labeling and scaling of axes, add trend lines and
other elements to the scatterplot, and change the marker types. The edited chart apears in Figure 10-9.
If you like, you can save this particular combination as a chart template to use it again in the future.

Figure 10-9 Edited scatterplot

SPSS TUTORIALS

Page
78

Compiled: Anim Ofosu

Lesson 11: Linear Regression


Objectives
1. Determine the regression equation.
2. Compute predicted Y values.
3. Compute and interpret residuals.
Overview
Closely related to correlation is the topic of linear regression. As you learned in Lesson 10, the
correlation coefficient is an index of linear relationship. If the correlation coefficient is significant,
that is an indication that a linear equation can be used to model the relationship between the predictor
X and the criterion Y. In this lesson you will learn how to determine the equation of the line of best fit
between the predictor and the criterion, how to compute predicted values based on that linear
equation, and how to calculate and interpret residuals.
Example Problem and Data
This spring term you are in a large introductory psychology class. You observe an apparent
relationship between the outside temperature and the number of people who skip class on a given day.
More people seem to be absent when the weather is warmer, and more seem to be present when it is
cooler outside. You randomly select 10 class periods and record the outside temperature reading 10
minutes before class time and then count the number of students in attendance that day. If you
determine that there is a significant linear relationship, you would like to impress your professor by

SPSS TUTORIALS

Page
79

Compiled: Anim Ofosu

predicting how many people will be present on a given day, based on the outside temperature. The
data you collect are the following:
Temp

Attendance

50

87

77

60

67

73

53

86

75

59

70

65

83

65

85

62

80

58

64

89

Entering the Data in SPSS


These pairs of data must be entered as separate variables. The data file may look something like the
following (see Figure 11-1):

Figure 11-1 Data in SPSS


If you prefer, you can download a copy of the data. As you learned in Lesson 10, you should first
determine whether there is a significant correlation between temperature and attendance. Running the
Correlation procedure (see Lesson 10 for details), you find that the correlation is -.87, and is
significant at the .01 level (see Figure 11-2).

SPSS TUTORIALS

Page
80

Compiled: Anim Ofosu

Figure 11-2 Significant correlation


A scatterplot is helpful in visualizing the relationship (see Figure 11-3). Clearly, there is a negative
relationship between attendance and temperature.

Figure 11-3 Scatterplot


Linear Regression

SPSS TUTORIALS

Page
81

Compiled: Anim Ofosu

The correlation and scatterplot indicate a strong, though by no means perfect, relationship between the
two variables. Let us now turn our attention to regression. We will "regress" the attendance (Y)on the
temperature (X). In linear regression, we are seeking the equation of a straight line that best fits the
observations. The usefulness of such a line may not be immediately apparent, but if we can model the
relationship by a straight line, we can use that line to predict a value of Y for any value of X, even
those that have not yet been observed. For example, looking at the scatterplot in Figure 11-3, what
attendance would you predict for a temperature of 60 degrees? The regression line can answer that
question. This line will have an intercept term and a slope coefficient and will be of the general form

The intercept and slope (regression) coefficient are derived in such a way that the sums of the squared
deviations of the actual data points from the line are minimized. This is called "ordinary least squares"
estimation or OLS. Note that the predicted value of Y (read "Y-hat") is a linear combination of two
constants, the intercept term and the slope term, and the value of X, so that the only thing that varies is
the value of X. Therefore, the correlation between the predicted Ys and the observed Ys will be the
same as the correlation between the observed Ys and the observed Xs. If we subtract the predicted
value of Y from the observed value of Y, the difference is called a "residual." A residual represents the
part of the Y variable that cannot be explained by the X variable. Visually, the distance between the
observed data points and the line of best fit represents the residual.
SPSS's Regression procedure allows us to determine the equation of the line of best fit, to calculate
predicted values of Y, and to calculate and interpret residuals. Optionally, you can save the predicted
values of Y and the residuals as either standard scores or raw-score equivalents.
Running the Regression Procedure
Open the data file in SPSS. Select Analyze, Regression, and then Linear (see Figure 11-4).

SPSS TUTORIALS

Page
82

Compiled: Anim Ofosu

Figure 11- 4 performing the Regression procedure


The Regression procedure outputs a value called "Multiple R," which will always range from 0 to 1.
In the bivariate case, Multiple R is the absolute value of the Pearson r, and is thus .87. The square of r
or of Multiple R is .752, and represents the amount of shared variance between Y and X. When we run
the regression tool, we can optionally ask for either standardized or unstandardized (raw-score)
predicted values of Y and residuals to be calculated and saved as new variables (see Figure 11-5).

SPSS TUTORIALS

Page
83

Compiled: Anim Ofosu

Figure 11-5 Save options in the Regression procedure


Click OK to run the Regression procedure. The output is shown in Figure 11-6. In the ANOVA table
summarizing the regression, the omnibus F test tests the hypothesis that the population Multiple R is
zero. We can safely reject that null hypothesis. Notice that dividing the regression sum of squares,
which is based on the predicted values of Y, by the total sum of squares, which is based on the
observed values of Y, produces the same value as R Square. The value of R Square thus represents the
proportion of variance in the criterion that can be explained by the predictor. The residual sum of
squares represents the variance in the criterion that remains unexplained.

SPSS TUTORIALS

Page
84

Compiled: Anim Ofosu

Figure 11-6 Regression procedure output


In Figure 11-7 you can see that the residuals and predicted values are now saved as new variables in
the SPSS data file.

SPSS TUTORIALS

Page
85

Compiled: Anim Ofosu

Figure 11-7 Saving predicted values and residuals


The regression equation for predicting attendance from the outside temperature is 133.556 - .897 x
Temp. So for a temperature of 60 degrees, you would predict the attendance to be 80 students (see
Figure 11-8 in which this is illustrated graphically). Note that this process of using a linear equation to
predict attendance from the temperature has some obvious practical limits. You would never predict
attendance higher than 100 percent, for example, and there may be a point at which the temperature
becomes so hot as to be unbearable, and the attendance could begin to rise simply because the
classroom is air-conditioned.

SPSS TUTORIALS

Page
86

Compiled: Anim Ofosu

Figure 11-8 Linear trend line and regression equation


To impress your professor, assume that the outside temperature on a class day is 72 degrees.
Substituting 72 for X in the regression equation, you predict that there will be 69 students in
attendance that day.
Examining Residuals
A residual is the difference between the observed and predicted values for the criterion variable (Hair,
Black, Babin, Anderson, & Tatham, 2006). Bivariate linear regression and multiple linear regression
make four key assumptions about these residuals.
1. The phenomenon (i.e., the regression model being considered) is linear, so that the
relationship between X and Y is linear.
2. The residuals have equal variances at all levels of the predicted values of Y.
3. The residuals are independent. This is another way of saying that the successive observations
of the dependent variable are uncorrelated.
4. The residuals are normally distributed with a mean of zero.
Thus it can be very instructive to examine the residuals when you perform a regression analysis. It is
helpful to examine a histogram of the standardized residuals (see Figure 11-9), which can be created
from the Plots menu. The normal curve can be superimposed for visual reference.

SPSS TUTORIALS

Page
87

Compiled: Anim Ofosu

Figure 11-9 Histogram of standardized residuals


These residuals appear to be approximately normally distributed. Another useful plot is the normal pp plot produced as an option in the Plots menu. This plot compares the cumulative probabilities of the
residuals to the expected frequencies if the residuals were normally distributed. Significant departures
from a straight line would indicate nonnormality in the data (see Figure 11-10). In this case the
residuals appear once again to be fairly normally distributed.

SPSS TUTORIALS

Page
88

Compiled: Anim Ofosu

Figure 11-10 Normal p-p plot of observed and expected cumulative probabilities of residuals
When there are significant departures from normality, homoscedasticity, and linearity, data
transformations or the introduction of polynomial terms such as quadratic or cubic values of the
original independent or dependent variables can often be of help (Edwards, 1976).
References
Edwards, A. L. (1976). An introduction to linear regression and correlation. San Francisco: Freeman.
Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., and Tatham, R. L. (2006). Multivariate data
analysis (6th ed.). Upper Saddle River, NJ: Pearson Prentice Hall.

SPSS TUTORIALS

Page
89

Compiled: Anim Ofosu

Lesson 12: Multiple Correlation and Regression


Objectives
1. Perform and interpret a multiple regression analysis.
2. Test the significance of the regression and the regression coefficients.
3. Examine residuals for diagnostic purposes.
Overview
Multiple regression involves one continuous criterion (dependent) variable and two or more predictors
(independent variables). The equation for a line of best fit is derived in such a way as to minimize the
sums of the squared deviations from the line. Although there are multiple predictors, there is only one
predicted Y value, and the correlation between the observed and predicted Y values is called Multiple
R. The value of Multiple R will range from zero to one. In the case of bivariate correlation, a
regression analysis will yield a value of Multiple R that is the absolute value of the Pearson product
moment correlation coefficient between X and Y, as discussed in Lesson 11. The multiple linear
regression equation will take the following general form:

Instead of using a to represent the Y intercept, it is common practice in multiple regression to call the
intercept term b0. The significance of Multiple R, and thus of the entire regression, must be tested. As
well, the significiance of the individual regression coefficients must be examined to verify that a
particular independent variable is adding significantly to the prediction.
As in simple linear regression, residual plots are helpful in diagnosing the degree to which the
linearity, normality, and homoscedasticity assumptions have been met. Various data transformations
can be attempted to accommodate situations of curvilinearity, non-normality, and heteroscedasticity.
In multiple regression we must also consider the potential impact of multicollinearity, which is the
degree of linear relationship among the predictors. When there is a high degree of collinearity in the
predictors, the regression equation will tend to be distorted, and may lead to inappropriate conclusions
regarding which predictors are statistically significant (Lind, Marchal, and Wathen, 2006). For this
reason, we will ask for collinearity diagnostics when we run our regression. As a rule of thumb, if the
variance inflation factor (VIF) for a given predictor is very high or if the absolute value of the
correlation between two predictors is greater than .70, one or more of the predictors should be
dropped from the analysis, and the regression equation should be recomputed.
Multiple regression is in actuality a general family of techniques, and the mathematical and statistical
underpinnings of multiple regression make it an extremely powerful and flexible tool. By using group
membership or treatment level qualitative coding variables as predictors, one can easily use multiple
regression in place of t tests and analyses of variance. In this tutorial we will concentrate on the
simplest kind of multiple regression, a forced or simultaneous regression in which all predictor
variables are entered into the regression equation at one time. Other approaches include stepwise
regression in which variables are entered according to their predictive ability and hierarchical
regression in which variables are entered according to theory or hypothesis. We will examine
hierarchical regression more closely in Lesson 14 on analysis of covariance.
Example Data

SPSS TUTORIALS

Page
90

Compiled: Anim Ofosu

The following data (see Figure 12-1) represent statistics course grades, GRE Quantitative scores, and
cumulative GPAs for 32 graduate students at a large public university in the southern U.S. (source:
data collected by the webmaster). You may click here to retrieve a copy of the entire dataset.

Figure 12-1 Statistics course grades, GREQ, and GPA (partial data)
Preparing for the Regression Analysis
We will determine whether quantitative ability (GREQ) and cumulative GPA can be used to predict
performance in the statistics course. A very useful first step is to calculate the zero-order correlations
among the predictors and the criterion. We will use the Correlate procedure for that purpose. Select
Analyze, Correlate, Bivariate (see Figure 12-2).

SPSS TUTORIALS

Page
91

Compiled: Anim Ofosu

Figure 12-2 Calculate intercorrelations as preparation for regression analysis


In the Options menu of the resulting dialog box, you can request descriptive statistics if you like. The
resulting intercorrelation matrix reveals that GREQ and GPA are both significantly related to the
course grade, but are not significantly related to each other. Thus our initial impression is that
collinearity will not be a problem (see Figure 12-3).

SPSS TUTORIALS

Page
92

Compiled: Anim Ofosu

Figure 12-3 Descriptive statistics and intercorrelations


Conducting the Regression Analysis
To conduct the regression analysis, select Analyze, Regression, Linear (see Figure 12-4).

SPSS TUTORIALS

Page
93

Compiled: Anim Ofosu

Figure 12-4 Selecting the Linear Regression procedure


In the Linear Regression dialog box, move Grade to the Dependent variable field and GPA and GREQ
to the Independent(s) list, as shown in Figure 12-5.

SPSS TUTORIALS

Page
94

Compiled: Anim Ofosu

Figure 12-5 Linear Regression dialog box


Click on the Statistics button and check the box in front of collinearity diagnostics (see Figure 12-6).

Figure 12-6 Requesting collinearity diagnostics


Select Continue and then click on Plots to request standardized residual plots and also to request
scatter diagrams. You should request a histogram and normal distribution plot of the standardized
residuals. You can also plot the standardized residuals against the standardized predicted values to
check the assumption of homoscedasticity (see Figure 12-7).

SPSS TUTORIALS

Page
95

Compiled: Anim Ofosu

Click OK to run the regression analysis. The results are excerpted in Figure 12-8.

Figure 12-8 Regression procedure output (excerpt)

SPSS TUTORIALS

Page
96

Compiled: Anim Ofosu

Interpreting the Regression Output


The significant overall regression indicates that a linear combination of GREQ and GPA predicts
grades in the statistics course. The value of R-Square is .513, and indicates that about 51 percent of
the variation in grades is accounted for by knowledge of GPA and GREQ. The significant t values for
the regression coefficients for GREQ and GPA show that each variable contributes significantly to the
prediction. Examining the unstandardized regression coefficients is not very instructive, because these
are based on raw scores and their values are influenced by the units of measurement of the predictors.
Thus, the raw-score regression coefficient for GREQ is much smaller than that for GPA because the
two variables use different scales. On the other hand, the standardized coefficients are quite
interpretable, because each shows the relative contribution to the prediction of the given variable with
the other variable held constant. These are technically standardized partial regression coefficicients.
In the present case, we can conclude that GREQ has more predictive value than GPA, though both are
significant.
The collinearity diagnostics indicate a low degree of overlap between the predictors (as we predicted).
If the two predictor variables were orthogonal (uncorrelated), the variance inflation factor (VIF) for
each would be 1. Thus we conclude that there is not a problem with collinearity in this case.
The histogram of the standardized residuals shows that the departure from normality is not too severe
(see Figure 12-9).

Figure 12-9 Histogram of standardized residuals

SPSS TUTORIALS

Page
97

Compiled: Anim Ofosu

The normal p-p plot indicates some departure from normality and may suggest a curvilinear
relationship between the predictors and the criterion (see Figure 12-10).

Figure 12-10 Nomal p-p plot


The plot of standardized predicted values against the standardized residuals indicates a large degree of
heteroscedasticity (see Figure 12-11). This is mostly the result of a single outlier, case 11 (Participant
118), whose GREQ and grade scores are significantly lower than those of the remainder of the group.
Eliminating that case and

SPSS TUTORIALS

Page
98

Compiled: Anim Ofosu

Lesson 13: Chi-Square Tests


Objectives
1. Perform and interpret a chi-square test of goodness of fit.
2. Perform and interpret a chi-square test of independence.
Overview
Chi-square tests are used to compare observed frequencies to the frequencies expected under some
hypothesis. Tests for one categorical variable are generally called goodness-of-fit tests. In this case,
there is a one-way table of observed frequencies of the levels of some categorical variable. The null
hypothesis might state that the expected frequencies are equally distributed or that they are unequal on
the basis of some theoretical or postulated distribution.
Tests for two categorical variables are usually called tests of independence or association. In this case,
there will be a two-way contingency table with one categorical variable occupying rows of the table
and the other categorical variable occupying columns of the table. In this analysis, the expected
frequencies are commonly derived on the basis of the assumption of independence. That is, if there
were no association between the row and column variables, then a cell entry would be expected to be
the product of the cell's row and column marginal totals divided by the overall sample size.
In both tests, the chi-square test statistic is calculated as the sum of the squared differences between
the observed and expected frequencies divided by the expected frequencies, according to the
following simple formula:

where O represents the observed frequency in a given cell of the table and E represents the
corresponding expected frequency under the null hypothesis.
We will illustrate both the goodness-of-fit test and the test of independence using the same dataset.
You will find the goodness of fit test for equal or unequal unexpected frequencies as an option under
Nonparametric Tests in the Analyze menu. For the chi-square test of independence, you will use the
Crosstabs procedure under the Descriptive Statistics menu in SPSS. The cross-tabulation procedure
can make use of numeric or text entries, while the Nonparametric Test procedure requires numeric
entries. For that reason, you will need to recode any text entries into numerical values for goodnessof-fit tests.
Example Data
Assume that you are interested in the effects of peer mentoring on student academic success in a
competitive private liberal arts college. A group of 30 students is randomly selected during their
freshman orientation. These students are assigned to a team of seniors who have been trained as tutors
in various academic subjects, listening skills, and team-building skills. The 30 selected students meet
in small group sessions with their peer tutors once each week during their entire freshman year, are

SPSS TUTORIALS

Page
99

Compiled: Anim Ofosu

encouraged to work with their small group for study sessions, and are encouraged to schedule private
sessions with their peer mentors whenever they desire. You identify an additional 30 students at
orientation as a control group. The control group members receive no formal peer mentoring. You
determine that there are no significant differences between the high school grades and SAT scores of
the two groups. At the end of four years, you compare the two groups on academic retention and
academic performance. You code mentoring as 1 = present and 0 = absent to identify the two groups.
Because GPAs differ by academic major, you generate a binary code for grades. If the student's
cumulative GPA is at the median or higher for his or her academic major, you assign a 1. Students
whose grades are below the median for their major receive a zero. If the student is no longer enrolled
(i.e., has transferred, dropped out, or flunked out), you code a zero for retention. If he or she is still
enrolled, but has not yet graduated after four years, you code a 1. If he or she has graduated, you code
a 2.
You collect the following (hypothetical) data:

Properly entered in SPSS, the data should look like the following (see Figure 13-1). For your
convenience, you may also download a copy of the dataset.

SPSS TUTORIALS

Page
100

Compiled: Anim Ofosu

Figure 13-1 Dataset in SPSS (partial data)


Conducting a Goodness-of-Fit Test
To determine whether the three retention outcomes are equally distributed, you can perform a
goodness-of-fit test. Because there are three possible outcomes (no longer enrolled, currently enrolled,
and graduated) and sixty total students, you would expect each outcome to be observed in 1/3 of the
cases if there were no differences in the frequencies of these outcomes. Thus the null hypothesis
would be that 20 students would not be enrolled, 20 would be currently enrolled, and 20 would have
graduated after four years. To test this hypothesis, you must use the Nonparametric Tests procedure.
To conduct the test, select Analyze, Nonparametric Tests, Chi-Square as shown in Figure 13-2.

SPSS TUTORIALS

Page
101

Compiled: Anim Ofosu

Figure 13-2 Selecting chi-square test for goodness of fit


In the resulting dialog box, move Retention to the Test Variable List and accept the default for equal
expected frequencies. SPSS counts and tabulates the observed frequencies and performs the chisquare test (see Figure 13-3). The degrees of freedom for the goodness-of-fit test are the number of
categories minus one. The significant chi-square shows that the freqencies are not equally distributed,
2 (2, N = 60) = 6.10, p = .047.

SPSS TUTORIALS

Page
102

Compiled: Anim Ofosu

Figure 13-3 Chi-square test of goodness of fit


Conducting a Chi-Square Test of Independence
If mentoring is not related to retention, you would expect mentored and non-mentored students to
have the same outcomes, so that any observed differences in frequencies would be due to chance.
That would mean that you would expect half of the students in each outcome group to come from the
mentored students, and the other half to come from the non-mentored students. To test the hypothesis
that there is an association (or non-independence) between mentoring and retention, you will conduct
a chi-square test as part of the cross-tabulation procedure. To conduct the test, select Analyze,
Descriptive Statistics, Crosstabs (see Figure 13-4).

SPSS TUTORIALS

Page
103

Compiled: Anim Ofosu

Figure 13-4 Preparing for the chi-square test of independence


In the Crosstabs dialog, move one variable to the row field and the other variable to the column field.
I typically place the variable with more levels in the row field to keep the output tables narrower (see
Figure 13-5), though the results of the test would be identical if you were to reverse the row and
column variables.

SPSS TUTORIALS

Page
104

Compiled: Anim Ofosu

Figure 13-5 Establishing row and column variables


Clustered bar charts are an excellent way to compare the frequencies visually, so we will select that
option (see Figure 13-5). Under the Statistics option, select chi-square and Phi and Cramer's V
(measures of effect size for chi-square tests). You can also click on the Cells button to display both
observed and expected cell frequencies. The format menu allows you to specify whether the rows are
arranged in ascending (the default) or descending order. Click OK to run the Crosstabs procedure and
conduct the chi-square test.

SPSS TUTORIALS

Page
105

Compiled: Anim Ofosu

Figure 13-6 Partial output from Crosstabs procedure


For the test of independence, the degrees of freedom are the number of rows minus one multiplied by
the number of columns minus one, or in this case 2 x 1 = 2. The Pearson Chi-Square is significant,
indicating that mentoring had an effect on retention, 2 (2, N = 60) = 14.58, p < .001. The value of
Cramer's V is .493, indicating a large effect size (Gravetter & Walnau, 2005).
The clustered bar chart provides an excellent visual representation of the chi-square test results (see
Figure 13-7).

SPSS TUTORIALS

Page
106

Compiled: Anim Ofosu

Figure 13-7 Clustered bar chart


Going Further
For additional practice, you can use the Nonparametric Tests and Crosstabs procedures to determine
whether grades differ between mentored and non-mentored students and whether there is an
association between grades and retention outcomes.
References
Gravetter, F. J., & Walnau, L. B. (2005). Essentials of statistics for the behavioral sciences (5th ed.).
Belmont, CA: Thomson/Wadsworth.

SPSS TUTORIALS

Page
107

Compiled: Anim Ofosu

Lesson 14: Analysis of Covariance


Objectives
1. Perform and interpret an analysis of covariance using the General Linear Model.
2. Perform and interpret an analysis of covariance using hierarchical regression.
Analysis of covariance (ANCOVA) is a blending of regression and analysis of variance (Roscoe,
1975). It is possible to perform ANCOVA using the General Linear Model procedure in SPSS. An
entirely equivalent analysis is also possible using hierarchical regression, so the choice is left to the
user and his or her preferences. We will illustrate both procedures in this tutorial. We will use the
simplest of cases, a single covariate, two treatments, and a single variate (dependent variable).
ANCOVA is statistically equivalent to matching experimental groups with respect to the variable or
variables being controlled (or covaried). As you recall from correlation and regression, if two
variables are correlated, one can be used to predict the other. If there is a covariate(X) that correlates
with the dependent variable (Y), then dependent variable scores can be predicted by the covariate. If
this is the case, the differences observed between the groups cannot then be attributed to the
experimental treatment(s). ANCOVA provides a mechanism for assessing the differences in
dependent variable scores after statistically controlling for the covariate. There are two obvious
advantages to this approach: (1) any variable that influences the variation in the dependent variable
can be statistically controlled, and (2) this control can reduce the amount of error variance in the
analysis.
Example Data
Assume that you are comparing performance in a statistics class taught by two different methods.
Students in one class are instructed in the classroom, while students in the second class take their class
online. Both classes are taught by the same instructor, and use the same textbook, exams, and
assignments. At the beginning of the term all students take a test of quantitative ability (pretest), and
at the end, their score on the final exam is recorded (posttest). Because the two classes are intact, it is
not possible to achieve experimental control, so this is a quasi-experimental design. Assume that you
would like to compare the scores for the two groups on the final score while controlling for initial
quantitative ability. The hypothetical data are as follows:

SPSS TUTORIALS

Page
108

Compiled: Anim Ofosu

Before the ANCOVA


You may retrieve the SPSS dataset if you like. As a precursor to the ANCOVA, let us perform a
between-groups t test to examine overall differences between the two groups on the final exam. You
will find or recall this test as the subject of Lesson 4, and details will not be repeated here. The result
of the t test is shown below. See Figure 14-1. Of course, as you know, if there were multiple groups
you would perform an ANOVA rather than a t test. In this case, we conclude that the second method
led to improved test scores, but must rule out the possibility that this difference is attributable to
differences in quantitative ability of the two groups. As you know by now, you could just as easily
have compared the means using the Compare Means or One-way ANOVA procedures, and the square
root of the F-ratio obtained would be the value of t.

Figure 14-1 t Test Results

SPSS TUTORIALS

Page
109

Compiled: Anim Ofosu

As a second precursor to the ANCOVA, let us determine the degree of correlation between
quantitative ability and exam scores. As correlation is the subject of Lesson 10, the details are omitted
here, and only the results are shown in Figure 14-2.

Figure 14-2 Correlation between pre-test and post-test scores


Knowing that there is a statistically significant correlation between pretest and posttest scores, we
would like to exercise statistical control by holding the effects of the pretest scores constant. The
resulting ANCOVA will verify whether there are any differences in the posttest scores of the two
groups after controlling for differences in ability.
Performing the ANCOVA in GLM
To perform the ANCOVA via the General Linear Model menu, select Analyze, General Linear
Model, Univariate (see Figure 14-3).

SPSS TUTORIALS

Page
110

Compiled: Anim Ofosu

Figure 14-3 ANCOVA via the GLM procedure


In the resulting dialog box, move Posttest to the Dependent Variable field, Method to the Fixed
Factor(s) field, and Pretest to the Covariate(s) field. See Figure 14-4.

SPSS TUTORIALS

Page
111

Compiled: Anim Ofosu

Figure 14-4 Univariate dialog box


Under Options you may want to choose descriptive statistics and effect size indexes, as well as plots
of estimated marginal means for Method. As there are just two groups, main effect comparisons are
not appropriate. Examine Figure 14-5.

Figure 14-5 Univariate options for ANCOVA

SPSS TUTORIALS

Page
112

Compiled: Anim Ofosu

Click Continue. If you like, you can click on Plots to add profile plots for the estimated marginal
means of the posttest scores of the two groups after adjusting for pretest scores. Click on OK to run
the analysis. The results are shown in Figure 14-6. The results indicate that after controlling for initial
quantitative ability, the differences in posttest scores are statistically significantly different between
the two groups, F(1,27)=16.64, p < .001, partial eta-squared = .381.

Figure 14-6 ANCOVA results


The profile plot makes it clear that the online class had higher exam scores after controlling for initial
quantitative ability (see Figure 14-7).

SPSS TUTORIALS

Page
113

Compiled: Anim Ofosu

Figure 14-7 Profile plot


Performing an ANCOVA Using Hierarchical Regression
To perform the same ANCOVA using hierarchical regression, enter the posttest as the criterion. Then
enter the covariate (pretest) as one independent variable block and group membership (method) as a
second block. Examine the change in R-Square as the two models are compared, and the significance
of the change. The F value produced by this analysis is identical to that produced via the GLM
approach.
Select Analyze, Regression, Linear (see Figure 14-8).

SPSS TUTORIALS

Page
114

Compiled: Anim Ofosu

Figure 14-8 ANCOVA via hierarchical regression


Now enter Posttest as the Dependent Variable and Pretest as an Independent variable (see Figure 149).

SPSS TUTORIALS

Page
115

Compiled: Anim Ofosu

Figure 14-9 Linear regression dialog box


Click on the Next button and enter Method as an Independent variable, as shown in Figure 14-10.

Figure 14-10 Entering second block

SPSS TUTORIALS

Page
116

Compiled: Anim Ofosu

Click on Statistics, and check the box in front of R squared change (see Figure 14-11).

Figure 14-11 Specify R squared change


Click Continue then OK to run the hierarchical regression. Note in the partial output shown in Figure
14-12 that the value of F for the R Square Change with pretest held constant is identical to that
calculated earlier.

Figure 14-2 Hierarchical regression yields results identical to GLM


References
Roscoe, J. T. (1975). Fundamental research statistics for the behavioural sciences (2nd ed.). New
York: Hot, Rinehart and Winston, Inc.

SPSS TUTORIALS

Page
117

Compiled: Anim Ofosu

Você também pode gostar