Você está na página 1de 19

See

discussions, stats, and author profiles for this publication at:


https://www.researchgate.net/publication/232569186

A Quasi-Experimental Approach to
Assessing Treatment Effects in
Nonequivalent Control Group
Designs

Article in Psychological Bulletin · May 1975


DOI: 10.1037/0033-2909.82.3.345

CITATIONS READS

123 2,913

1 author:

David A. Kenny
University of Connecticut
205 PUBLICATIONS 93,152 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Essentials of Interpersonal Perception (a revision of my 1994 book) View


project

Interpersonal Perception View project

All content following this page was uploaded by David A. Kenny on 20 May 2014.

The user has requested enhancement of the downloaded file.


Psychological Bulletin
1975, Vol. 82, No. 3, 345-362

A Quasi-Experimental Approach to Assessing Treatment Effects


in the Nonequivalent Control Group Design
David A. Kenny
Harvard University
Four statistical tests of treatment effect are evaluated for the nonequivalent con-
trol group design. This design consists of pre- and posttreatment measures of a
dependent variable with biased assignment to treatment groups. The biased assign-
ment creates a treatment-pretest confounding for which different statistical tech-
niques adjust. The different statistical tests discussed are the analysis of covariance,
analysis of covariance with reliability correction, raw change score analysis, and
standardized change score analysis. If assignment to treatment groups is based on
-j^ ,the pretest score (a very infrequent event), analysis of covariance is the appropriate
mode of analysis. Selection based on the pretest true scores necessitates a reliability
correction procedure. Selection based on stable group differences and selection that
-occurs midway between the pre- and posttest necessitates change score analysis.
Compensatory education programs can be made to look mistakenly harmful if they
are analyzed by an inappropriate statistical technique when they should have been
analyzed by standardized change score analysis. Given a model of the process of
^selection into treatment groups, the nonequivalent control group design can yield
Snterpretable results.

All quasi-experimental designs meet the of quasi-experimentation. It is the concern


following three requirements: (a) There must with plausible, rival explanations of any hy-
be a treated and untreated group.1 (b) There pothesized treatment effect, or as Campbell
must be pretreatment and posttreatment (1957) calls it, internal validity. In true
measures, (c) There must be an explicit model experiments, where by definition assignment
that projects over time the difference between into treatment conditions is random, the rival
the treated and untreated groups, given no explanation of the treatment causing a mean
treatment effect. The third requirement is a difference between the treatment and control
synthesis of the other two and is at the heart group is sampling error, and a significance test
is used to rule out that explanation. For most
This research was supported in part by the Spencer quasi-experiments both sampling error and
and Milton Funds of Harvard University and National biased selection into groups loom as plausible
Science Foundation Grant GS-30273X, Donald T. rival hypotheses. Given nonrandom assign-
Campbell, principal investigator. The paper grew out
of a joint effort with Donald T. Campbell to revise his ment, the treated and untreated groups may
1971 paper on the same topic. His assistance was in- differ even in the absence of any treatment
valuable. Thanks are also due to Thomas Cook of effect. Time is brought into the design to
Northwestern University and Pierce Barker of the project the amount of this difference between
Rand Corporation, who commented on an earlier draft
of the manuscript. The author accepts full responsi- the treated and untreated groups. Certain
bility for any errors. Special thanks are due to both nonexperimental designs that do not include
Lenor Kirkeby and Ian Shrank, who were helpful in time, like the cross-sectional survey, are not
the preparation of the manuscript. amenable to a quasi-experimental analysis, but
Inquiries concerning this article should be sent to the nonequivalent control group design is.
David A. Kenny, Department of Psychology and
Social Relations, Harvard University, Cambridge, Weiss (1972) has suggested that the most
Massachusetts 02138. common design in evaluation research is a
1
Designs 7, 8, and 9 in Campbell and Stanley (1963) pretest-posttest design where subjects are not
are times series designs and do not involve groups as randomly assigned to groups, or as Campbell
such. However, these designs do have a group of
pretreatment observations (control group) and post- and Stanley (1963) refer to it, the nonequiv-
treatment observations (treated group). Since the alent control group design. The failure to
observations are not independent, standard t tests are randomize may be due to the fact that:
not valid. (a) a treatment is administered to classroom,
345
346 DAVID A. KENNY

school, school system, and an untreated class- of correlation it is a standardized measure of


room, school, or school system is taken as a the strength of a relationship. The use of
control group; (b) a true experiment is planned dummy codes for categorical independent
but because of mortality, contamination of the variables has been advocated by many authors
control persons by the experimentals, or varia- (e.g., Cohen, 1968).
tion in experimental treatment, the true Given these pretreatment differences on the
experiment has become a quasi-experiment; dependent variable, interpreting a difference
(c) because of scarce resources the treatment between the experimental and the control
is only given to a select group; (d) subjects group on the posttreatment measure becomes
self-select their own treatment level. Anyone problematic. To interpret a posttreatment
familiar with the problems encountered in difference one must examine the magnitude
natural settings must realize that although of the pretreatment difference, and that
randomization and true experimentation are difference or perhaps some adjusted difference
ideal goals in research, they are not always can be used as a baseline to judge the post-
possible. At least a pretreatment measure treatment difference. At issue is whether
allows the researcher to assess how confounded pretreatment differences should increase, de-
the experimental treatment is with the subjects' crease, or remain stable if there is no treatment
predisposition to respond to the dependent effect. One suggested mode of analysis has
variable. been to match control and experimental subjects
The magnitude of these differences between on the pretest to make the pretest differences
treated and untreated groups can be assessed vanish. Although this is a very popular
in many different ways: mean difference, mean strategy, matching must be rejected out of
difference relative to variance within groups, hand because of possible regression artifacts
or t ratio. Campbell (Note 1) has suggested an (e.g., Campbell & Stanley, 1963, pp. 10-12;
alternative measure: the treatment-effect Rulon, 1941) and because of increased Type II
correlation. The treatment variable is dummy errors (Chen, 1971).
coded (1 for treatment and 0 for control) The literature suggests four modes of sta-
and correlated with the pretest. A correlation tistical analysis to measure treatment effects
of a dichotomous variable with another vari- in the nonequivalent control group design:
able is sometimes called a point biserial corre- (a) analysis of covariance, (b) analysis of
lation and is itself a product moment correla- covariance with reliability correction, (c) raw
tion. A correlation may seem to be an unusual change score analysis, and (d) standardized
measure of difference between treated groups, change score analysis. The differences between
but it can be shown that both mean difference these four techniques can be illustrated by
and the t ratio can be easily transformed into r. considering what each technique takes as the
The formula for t to r is dependent variable (see Table 1). In covariance
analysis (or, equivalently, multiple regression)

=vs N - 2'
where N is the total number of subjects. The
the posttest is regressed on the pretest and the
residual from the regression is taken to be the
dependent variable. Some authors (e.g., Lord,
1960; Porter, 1967) have suggested that the
formula_for the difference between treated
estimated regression coefficient is attenuated
mean (Xi) and control mean (Xc) to r is
by measurement error in the pretest, so the
regression coefficient must be corrected for
attenuation. The appropriate correction is to
divide the regression coefficient of posttest
where s% is the variability of the observations, on pretest by the reliability of the pretest. A
P-s is the proportion of the total sample in the third approach is raw change score analysis.
treated group, and Pc is for the control group Although change score analysis.;is often con-
(PT + PC = 1). The treatment-effect corre- demned (e.g., Cronbach & Furby, 1970; Werts
lation is, then, closely related to t and the & Linn, 1970a), it is perhaps the most common
difference between means, but like all measures way of analyzing this design. As the name
NONEQVIVALENT CONTROL GROUP DESIGN 347

TABLE 1
NULL HYPOTHESES FOR THE FOUR MODES OF ANALYSIS OF THE NONEQUIVALENT CONTROL GROUP DESIGN

Statistical technique Dependent variable Null hypothesis

Analysis of covariance ' A 2 — 021 * T-A- 1 PX2T = pXiTpXtX,


Covariance with reliability correction
Raw change score analysis PXsT ~ PXiT&Xi/O'Xz
Standardized change score analysis Y*2 — A
A V*i
PXST = PXjT

Note: 621 • T is the unstandardized partial regression of X 2 on Xi, controlling for T; * indicates standardization of variable; Xi is
the pretest, Xz the posttest, and T the treatment variable.

suggests, the dependent variable is the posttest the correction factor is the pretest-posttest
minus the pretest. The popularity of this mode correlation of X. Since the pretest-posttest
of analysis is no doubt due to its seeming ease correlation is ordinarily less than one, analysis
of interpretation and the fact that it can be of covariance implies that in the absence of
viewed as part of a repeated measures analysis treatment effects, theposttest-treatment corre-
of variance. Raw change score analysis is lation will be less in absolute value than the
equivalent to the Time X Treatment inter- pretest-treatment correlation. The correction
action of a repeated measures analysis. It factor for the analysis of covariance with
should be noted here that using the change score reliability correction is the pretest-posttest
as the dependent variable and the pretest as a correlation of X divided by the reliability of
covariate yields significant test results that are the pretest. This correction factor is ordinarily
identical to the analysis of covariance with a less than one,2 making posttest-treatment less
metric of units of change (Werts & Linn, in absolute value than the pretest-treatment
1970a). A fourth mode of analysis is standard- correlation. The correction factor for raw
ized change score analysis. The pretest and change score analysis is the standard deviation
posttest are separately standardized (given of the pretest divided by the standard devia-
unit variance and zero mean) and then tion of the posttest. Standardized change
differenced. (Note that this technique does not score analysis has no correction factor, or more
standardize within treatment groups.) This precisely, has a correction factor of one: The
method is identical to raw change score two treatment-effect correlations should be
analysis with the exception that the variance equal.
of the dependent variable is made stationary Campbell and Erlebacher (1971) have
over time. suggested a fifth method if the pretest and
There is for each mode of analysis a sta- posttest are "factorially similar": covariance
tistical expression that equals zero if there analysis with common factor coefficient correc-
were no treatment effect, that is, the null tion. This method resembles analysis of co-
hypothesis of each method. As stated earlier, variance with reliability correction, the differ-
the orientation of this article is quasi-experi- ence being that the regression coefficient is
mental ; it seeks to determine the cases where a divided by the pretest-posttest correlation of
statistical technique correctly infers no effect. X instead of the reliability of Xi. This correc-
The differences among the four modes of tion yields the same null hypothesis as the
analysis can be brought into focus by express- null hypothesis for standardized change score
ing their null hypotheses as treatment-effect analysis, since the pretest-posttest correlations
correlations as in Table 1. Designations are cancel each other out.
as follows: pretest = Xi, posttest = Xz, and
treatment variable = T. 1
Given the standard formula for correction for
All four null hypotheses can be viewed as attenuation
setting the post-test-treatment correlation
equal to the pretest-treatment correlation
times a correction factor. (See Appendix for it directly follows that
derivations.) For the analysis of covariance x^Xi < 1 if
348 DAVID A. KENNY

It should be clear from Table 1 that the All variables (both measured and un-
null hypotheses of the four modes of analysis measured) are standardized because standard-
will rarely all be equal to zero except in highly ization decreases algebraic complexity and
trivial cases. Each of the four modes of analysis because in this case it means little loss of
has been advocated as the method of analysis generality. For instance, the null hypothesis for
for the nonequivalent control group design by analysis of covariance can be stated in terms of
various authors. Other authors (e.g., Lord, correlations. Raw change score analysis, how-
1967) have pointed out that it is paradoxical ever, necessitates the original metric.
that different methods yield different con- The equations for Xi and X2 can be ex-
clusions. Cronbach and Furby (1970) state pressed in terms of their causes G, Z, and E:
that treatments cannot be compared for this
design. The literature on this design is, there- Xu = aid + hZu + dEn (1)
fore, very confusing and not at all instructive Xu = a£!i + 62Z2i + e2E2i (2)
to the practitioner.
The validity of any mode of analysis where the subscripts 1 and 2 refer to time and
depends on its match with the process of subscript i to subject. Since no treatment
selection into groups. In the remaining part of effects are assumed, T is not written into the
this article a general model of selection is causal function of Xi or X2. The group differ-
elaborated and various special cases of that ences variable (G) is assumed to be perfectly
model are considered. Each mode of analysis stable, making its autocorrelation unity. This
is appropriate for a given model of selection. explains why G needs no time subscript.
Though not advocated as the mode of analysis, Relative position within groups (Z) may not
standardized change score analysis is empha- be perfectly stable, making its autocorrelation
sized. It has been previously discussed by less than one; the errors of measurement vari-
Woodrow (1939) and by Bereiter (1963) as a able (E) is perfectly unstable, making its
test of treatment effects, but both authors autocorrelation zero. All unmeasured variables
recommended some form of regression analysis. are assumed to be uncorrelated with each
Campbell (Campbell & Clayton, 1961; Camp- other3 with the previously stated exception
bell, Note 1) revived the method but gave that Z\ and Z2 may be correlated (pz^z^ =•])•
only an intuitive justification. Standardized If the treatment is correlated with the
change score analysis should be seriously con- pretest, it must be confounded with the causes
sidered as an alternative in the analysis of the of the pretest. Thus, the treatment must be
nonequivalent control group design. correlated with either group differences (G),
relative position within groups (Z), errors of
measurement (E), or any combination of the
THE GENERAL MODEL three. So in writing the causal function for
the treatment variable, G, Z, and E must be
There are three measured variables: X\ — a included. At the beginning the assumption is
pretest measure of the variable the treatment made that the occasion of selection of the
is to change, X2 = a posttest measure, and persons into treatment occurs at the pretest,
T = the treatment variable. thus making T confounded with Z and E at
The causal function of X is divided into Time 1. (This assumption will later change.)
three unmeasured, latent components: G The causal function of the treatment variable
= group difference such as sex, race, classroom, is thus
and others; Z = individual differences within
groups; and E = totally unstable causes of X Ti = qd + mZu + sEu + fU<, (3)
(errors of measurement). G must be included where U is a residual term that is uncorrelated
in the specification of causes of X because in with all unmeasured variables. U is simply a
field settings it can rarely be assumed that
sampling is done from a single homogeneous 3
The latent variables may be allowed to be correlated
population. Sampling is usually done from without altering many of the conclusions. I have
multiple populations, each of which has a chosen to make the variables uncorrelated to simplify
different mean level of X, presentation.
NONEQUIVALENT CONTROL GROUP DESIGN 349

variable that represents all other causes of


selection besides G, Z, and E.4
For readers who are aided by path diagrams
Equations 1, 2, and 3 are expressed in the path
diagram in Figure 1. A path diagram not only
pictorially represents a set of equations but
also is a heuristic to aid in the derivation of
correlations (Werts & Linn, 1970b). Using
either the path diagram or the algebra of
expectations the correlations between the
measured variables are
= qa>i + mbi + (4)
(5)
— a-ia-2 + (6) FIGURE 1. Path model of selection at the pretest.
(Xi is pretest, X% is posttest, T is treatment, G is group
The remaining part of this article will differences, E\ and Ei are errors of measurement, Z\
consider various special cases of this general and Z% are individual differences, and a\, <z2, 61; b%, elt e2,
model of selection that yield overidentifying s, q, m, f, and j are path coefficients.)
restrictions, that is, an algebraic constraint
on the correlations generated by the model. selection into treatment groups is not at the
Then it will determine if the overidentifying pretest.
restrictions derived from the model of selection
match the null hypothesis of any of the four Selection Based on the Pretest
modes of analysis given in Table 1.
Three different types of selection processes At times selection into treatment groups can
will be considered. The first type is selection be controlled, but the experimenter decides not
based on the measured pretest. This is the to assign subjects randomly to treatment condi-
only type of selection considered that is con- tions. The reason for not assigning randomly is
trolled by the researcher. In this case the that it may not be fair for all types of subjects
pretest is correlated either by design or by acci- to have the same probability of receiving the
dent with the treatment and therefore corre- treatment; that is, certain persons (e.g., the
lated with all the causes of the pretest: G, injured or disadvantaged) are considered more
Z, and E. For this type of selection the analysis deserving of the treatment. One strategy,
of covariance is appropriate. The second type called the regression discontinuity design by
of selection is selection based on the true Thistlethwaite and Campbell (1969), is to
pretest. For this case subjects select treatments, assign subjects to the treatment on the basis
and the selection process is related to G and Z of the pretest. Persons scoring above a certain
but not -E; analysis of covariance with relia- point would be given the treatment and those
bility correction is appropriate. The third type scoring below or equal to that point would be
is selection based on group differences. Subjects the controls. (Since the pretest is measured
are assigned to the treatment because of on an interval scale and treatment is dicho-
demographic and social variables, which are tomous, pXlT will not equal 1 but it will be
confounded with the pretest. If the effect of high.) Actually a pretest itself need not be
these demographic variables is stationary over used; any measure of "deservingness" will do.
time, standardized change score analysis is The treatment, then, is a function of the pretest
the appropriate mode of analysis. Last dis- (Xi) and random error (U):
cussed is the case in which the occasion of
Ti = kXu + fUi. (7)
4
Since T is measured dichotomously it may seem
as if U must be correlated with G, Zi, and E\. But like In this case the treatment is deliberately
any residual of a linear model, U can be constructed correlated with the pretest and therefore with
to be uncorrelated with G, Z\, and £1. the causes of the pretest, whereas in the true
350 DAVID A. KENNY

linearity can usually be remedied by including


higher order terms such as the pretest squared.
If the treatment interacts with the pretest,
this would violate the assumption of homoge-
neity of regression.
0-- Unhappy randomization. It seems to be an
all too common occurrence that randomization
of subjects into treatment groups produces
-2- treatment differences even before the treatment
is^'administered. Although it is clear that the
probability of such pretest differences is 1 out
PRETEST FOSTTEST of 20, given the conventional .05 level of
significance, the experimenter still has the
• EXPERIMENTALS feeling of being persecuted by fate. It seems
X CONTROLS that the experiment has been doomed and
TREND FOR CHANGE SCORE ANALYSIS there is no way to achieve valid inference.
TREND FOR ANALYSIS OF COVARIANCE
I should be clear about what I do not mean
FIGURE 2. Pretest and posttest means for a random- by unhappy randomization. I do not mean
ized experiment with zero mean and within-group cases of failure to randomize or cases where
regression of .5. there is randomization but a selected group of
control and experimental subjects fail to
experimental case the treatment is deliberately provide posttest data. I am speaking of a
uncorrelated with the pretest through random- randomized experiment with a pretest differ-
ization. But, .as in the true experimental case, ence between the experimentals and controls.
an assumption or specification has been gained Randomization has not failed, as is sometimes
by controlling the assignment to groups. In mistakenly thought; it is only that an un-
the population the treatment should correlate likely type of event has occurred.
with the unmeasured causes of pretest (G, Z\, Valid inference is possible in a way similar
and Ei) to the degree to which they cause the to that in the discussion of the previous section.
pretest. If Equation 1 is substituted for ; ^i If there is a pretest difference, the treatment
within Equation 7 the result is Ti = k(a\Gi is confounded with the causes of the pretest.
+ biZu + eiEu) + fUi. But since T also The expected degree of this confounding with
equals Equation 3, the following must be true: each cause is proportional to its causal effect
ajk = q, bik = m, and eik = s, or alternatively on the pretest.
q/ai = m/bi = s/e-i = k. Using either of these For a randomized experiment with the pretest
two equations and Equation 4, and remember- correlated with the treatment, the analysis of
ing that ai2 + &i2 + ei2 = 1 (since Xi is covariance is not only appropriate but,neces-
standardized), it can be shown that pxlT = k, sary. Consider the illustration in Figure 2.
and with Equations 5 and 6 it can be shown It has been assumed that the experimentals
that pXiT = kpXlXf Solving both of the scored four units higher than the controls on
above equations for k, it follows that px^r
the pretest even though subjects were ran-
= pXjTpx^y- which is the null hypothesis for
analysis of covariance in Table 1. Thus, when domly assigned to groups. Assuming that the
subjects are assigned to the treatment on the pooled within-regression coefficient is .5 (the
basis of the pretest, analysis of covariance is grand mean is zero at Times 1 and 2), the
the appropriate mode of analysis.6 expected difference at the posttest is two units.
Special attention should be paid to the This regression toward the mean is expected
validity of covariance's assumptions of homoge- because at the pretest the experimentals
neity and linearity of regression. Lack of should have positive errors and positive
5 individual differences, whereas the controls
1 claim no originality for this proof. For a related
proof see Goldberger (Note 7), and for simulations see should have negative errors and individual
Sween (1971). differences. Over time this gap should narrow
NONEQVIVALENT CONTROL GROUP DESIGN 351

and both groups regress toward the grand into treatment groups is made on the basis of
mean. the true pretest, aiG + biZi, and not on errors
The dashed line in Figure 2 indicates the of measurement or any other function of G
expectation of group differences at the posttest and Z. In the case of selection based on the
given no regression toward the mean. This measured pretest a similar hypothesis was
would be the baseline expected by change score made, but that hypothesis was justified by the
analysis or equivalently by the lack of a design of the research. In this case Equation 8
Treatment X Time interaction in a repeated is an assumption that must be justified by
measures analysis of variance. Change score evidence from the selection process itself.
analysis fails to take into account the regression If s = 0 and Equation 8 holds, Correlations
toward the mean that should be expected. 4 and 5 become
Covariance analysis is also generally more
powerful than change score analysis.
(9)
The control of assignment to groups either
by randomization or assignment based on
some other measured criterion necessitates The reliability of the pretest px^ is defined
analysis of covariance. Once again, special care as oi2 + bf, making PX:T = kpxlXl, and given
should be taken to meet the assumptions of Equation 6, px^r — kpx^x^- The analysis of
analysis of covariance, especially those of covariance null hypothesis will not equal zero
linearity and homogeneity of regression. since

Selection Based on the Pretest True Score


The case of interest here is not the one in
which the researcher controls selection into Except in trivial cases, the above will equal
treatment groups but the one in which this is zero only if the reliability of the pretest is
not the case. Two types of subject selection perfect.
will be discussed: first, selection based on the The reason for this bias is that the within-
pretest true score (G and Z) and second, selec- groups regression coefficient is attenuated
tion based only on group differences (6). because the pretest is measured with error.
The model previously discussed assumes that The posttest should be regressed not on the
the treatment is correlated with the errors of measured pretest as in covariance analysis but
measurement in the pretest. This is generally on the true pretest. It is because of this bias
implausible when selection is uncontrolled. If that both Lord (1960) and Porter (1967) have
persons select themselves into programs, suggested a correction for the analysis of
selection is more likely to reflect their true covariance. This correction can be thought of
ability and not their chance performance on the as correcting the regression coefficient for
pretest. To repeat, the subjects, not the unreliability in the pretest. To make the
experimenter, control selection, making the reliability correction there must be an estimate
treatment correlated with only the true causes of the reliability of the pretest : a? + hi2.
of the pretest. (The true scores are not actually Assuming that there is an estimate of relia-
measured.) If errors of measurement do not bility and 5 = 0 and Equation 8 holds, the
cause selection, the result is the same model as formula in Table 1 for the analysis of covariance
in Figure 1 except that s = 0. As earlier, it is with reliability correction equals zero if there
assumed that the ratio of the effect of group are no treatment effects.
differences on the treatment to their effect on The difficulty with the reliability correction
the pretest equals the ratio of the effect of procedure is the necessity of having a relia-
individual differences on the treatment to their bility estimate. The inclusion of any ad hoc
effect on the pretest, that is: estimate, for example, internal consistency,
into the correction formula almost certainly
(8) increases the standard error of the estimate.
Ideally the reliability estimation procedure
This assumption presupposes that selection should be part of a general model. Following
352 DAY ID,A. KENNY

Lord (1960), consider, for example, a parallel 3. To receive a selective treatment a person
measure of Xi, say F. Let or that person's sponsor must be highly
motivated or have political connections and
Yi = a3G; + btZii + e3Esi, organizational "savvy." These volunteers differ
where £3 is uncorrelated with all other un- systematically from nonvolunteers on a num-
measured variables. If we assume that Equa- ber of characteristics (Rosenthal & Rosnow,
tion 8 holds and that s = Q, then it follows that 1969).
4. The treatment either is a sociological or
PXlY — WJ.W3 , viva demographic variable or hopelessly confounded
with one. Examples of this are a study on the
effects of dropping out of high school or testing
PTY = k(aias+ 6163). (11) for differences in socialization between males
Given Equations 9, 10, and 11, it follows that and females. The reader can probably also
conceive other patterns of sociological selec-
tion. Suffice it to say that it is a rather common
form of selection into social programs.
Substituting the above formula for reliability In terms of the general model, selection
into the reliability correction formula in Table based on group differences implies that
1 yields the following null hypothesis : j = m = 0. To gain an overidentifying restric-
PXfl =
tion one must assume some form of station-
arity, that is, that the effect of group differences
or equivalently in vanishing tetrad 6 form : is the same at the pre- and posttest. Campbell
(Note 2) has argued for just such a model
= 0.
with what he called the fan spread hypothesis.
The vanishing tetrad can be tested by the null The fan spread hypothesis can be illustrated
hypothesis of a zero second canonical correla- pictorially. In Figure 3 two groups start out
tion between variables Xi and T and variables at Time 1 with divergent means. Campbell
X, and F (Kenny, 1974). (1969) suggests that associated with this mean
difference is a difference in maturation; those
Selection Based on Croup Differences with the higher mean mature at a greater
rate than those with the lower mean. Campbell
For most social programs assignment to the calls this the "interaction of selection and
treatment is not based on some psychological maturation" and has used this interaction as
individual difference, that is, true score, but an argument against raw change score analysis.
on some sociological, demographic, or social Since the mean difference between groups is
psychological characteristic. This sociological widening over time, change score analysis only
selection is brought about in a variety of ways : indicates the more rapid rate of maturation of
1. It may be a matter of policy or legislation the initially higher group. The fan spread
that treatment is available to a particular hypothesis is that increasing variability within
social group, for example, persons living in groups accompanies increasing mean differ-
particular census tracts. It may be virtually ences. In its strictest form the fan spread
impossible to find members of that social group hypothesis is that the difference between group
who did not receive the treatment. means relative to the pooled standard deviation
2. Treatments are administered to members within groups is constant over time. In Figure 3
of an entire organization (e.g., school system), the ratio of mean difference to standard
and members of the treated organization must deviation is always 4:1.
be compared with another organization. The rationale for the fan spread hypothesis
is that the different groups are members ©f
6
A vanishing tetrad is a general null hypothesis in different populations living in different en-
factor analysis of the form vironments. The different environments create
P12P34 — PlSPU — 0, and maintain different levels of performance
where 1, 2, 3, and 4 are variables. and different rates of growth. Given that
NONEQVIVALENT CONTROL GROUP DESIGN 353

growth' is a cumulative process, variability


increases over time. The groups would eventu-
ally approach asymptote at different levels.
The fan spread hypothesis is only a hy-
pothesis; it must be empirically validated. I
have found it to be a very workable hypothesis.
Generally racial and social class differences on
cognitive tests show fan spread, that is, the
correlations of these variables are stationary
over time. Raw change score analysis predicts
that co-variances are stationary. A powerful
illustration of fan spread is in Equality of
Educational Opportunity (Coleman et al., 1966),
in which the gap between blacks and whites as
measured by the correlation of race (dummy
coded) with verbal achievement is relatively
stationary over time, while the difference in GRADE
grade equivalents increases. A thorough review
FIGURE 3. Fan spread model with linear growth and
of various literatures is needed to test the (J!"i - Xz)/s equal to 4.
adequacy of the fan spread model. Naively,
one would expect fan spread to be more likely on chance" in a way analogous to the shrinkage
as (a) the time interval between pre- and in multiple correlations (cf. Lord & Novick,
posttest becomes shorter, (b) the tests used 1968, pp. 285-288). Rather the researcher
are more similar, and (c) the amount of growth should a priori choose background variables
becomes smaller. Formally, the model for and partial them separately out of the pretest,
selection on the basis of group differences is posttest, and treatment, adjusting degrees of
TI = qd + fU(. Fan spread implies that freedom accordingly (0'Conner, Note 3).
ffi = a2, making px-^r = PX%T (equality of the Rarely does this control procedure reduce the
treatment-effect correlations). pretest-treatment correlation to zero.
If the variance of the pretest and posttest is With these background variables, I suggest
stationary over time, then the results of comparing partial correlations of the treatment
standardized change score analysis and raw with the pretest and the posttest, with the
change score analysis converge. Thus, for cases background variables partialed out. The partial
where variance is stationary, raw change score correlations are compared instead of partial
analysis is valid. Standardization is needed regression coefficients because of the assump-
only to stabilize variance over time. Ideally a tion of stability of variance over time; that is,
priori power transformations can be found to the variability of the dependent variable that
stabilize variance in a way akin to meeting is not explained by the background variables is
the homogeneity of variance assumptions of stable over time.
analysis of variance. As an illustration, consider the correlations
The discussion has narrowly focused on the in Table 2. The data are taken from Educa-
univariate problem of attributing treatment tional Testing Service's growth study (Hilton
effects to change. The emphasis on correlations & Myers, 1967). Of interest here is the effect
is not meant to discourage control of back- of social class on knowledge of industrial arts.
ground variables by multiple regression. These For a sample of 2,164 the correlation of social
background measures can be used both to class with industrial arts is positive at both
increase power and to reduce pretest-treat- Grades 5 and 11. There is, however, a signifi-
ment confounding. The choice of background cant decrease in correlation over time. Stan-
variables should be guided by theory and dardized partial regression weights reveal the
experience, not solely by the desire to explain same effect. Accompanying the decrease in the
the pretest-treatment correlation. Merely to correlation of social class with industrial arts
reduce the pretest confounding is "capitalizing is a dramatic increase in sex differences favoring
354 DAVID A. KENNY

TABLE 2 Selection Not at the Pretest


RELATIONSHIPS BETWEEN SEX AND SOCIAL CLASS WITH
KNOWLEDGE OE INDUSTRIAL ARTS AT STH I have assumed that the occasion for selec-
AND HTH GRADE tion into treatment is at the pretest. For most
programs the occasion for selection is not so
Variable Grade 5 Grade 11 well denned. Treated and control subjects may
Raw correlation coefficients
drop out of the program or move out of the
area, some controls may enter the program,
Sex .239 .508 unsuccessful or unhappy treated subjects may
Social class .240 .191 not show up at the posttest, and so on. (For
a review of some of the difficulties of longi-
Partial regression coefficients tudinal studies see Crider, Willits, & Bealer,
Sex .235 .508
1973.) Sometimes the "pretest" takes place
Social class .236 .182 well before the treatment begins; for example,
test scores of the previous year are used as a
Partial correlation coefficients pretest for a remedial program for the current
year. For many real-world programs the
Sex .242 .514 occasion of selection into the program is not
Social class .242 .211
identical with the pretest.
Note. Males = 1; females = 0. N = 2,164. This implies that if assignment is based
on the true score, the correlation of the treat-
ment with the effect variable is not the highest
males. A fairer test of the stability of the effect at the pretest. This would substantially bias
social class over time would be to examine its analysis of covariance with reliability correc-
correlation with industrial arts, with sex tion. Imagine not measuring X\ but using X$
partialed out. As can be seen in Table 2, the as a "pretest." For such a case the analysis
partial correlation of social class with in- of covariance with reliability correction would
dustrial arts decreases less over time; the be biased if parameters a, b, and j remained
decrease is statistically nonsignificant. The stationary over time. The treatment-effect
decrease in correlation of social class with correlations, however, are equal to each other,
industrial arts over time is thus an artifact given the assumption of stationarity. Thus, if
of increasing correlation of sex with industrial selection occurs midway between the pretest
arts. This artifact is not due to the correlation and the posttest, or "averages out" midway,
between sex and social class (it is only .018) and if stationarity can be assumed, then
but to a reduction in variability of industrial standardized change score analysis is the ap-
arts at Grade 11 that is correctable with social propriate form of analysis. If selection is based
class given the increased effect of sex. on group differences, the occasion of selection
To summarize, if group differences are the is irrelevant since group differences are
sources of confounding of the treatment with perfectly stable.
the dependent variable, and if the effects of
these group differences are stationary over Measurement of the Treatment and. the Effect
time, then some form of change score analysis
should be used. If the variance of the dependent Special care must be taken in the measure-
variable is stationary, then original metric ment of the treatment so that the above
should be used. If not, then some variance assumptions are not violated. A troublesome
equalizing transformation should be made. error is retrospective measurement of the
One possible transformation is standardization. treatment at the posttest (Campbell &
Finally, background variables should be con- Clayton, 1961). This practice usually insures
trolled for by using partial correlations for the that errors of measurement are correlated with
standardized case and multiple regression the treatment and that individual differences
analysis for the raw metric. The advocacy of are more highly correlated with the treatment
regression analysis in this case is to control for at the posttest than at the pretest. To insure
the background variables but not the pretest. that errors of measurement and other ir-
NONEQUIVALENT CONTROL GROUP DESIGN 355

relevant factors are uncorrelated with the the treatment-effect correlations should be
treatment, the method of the measurement of corrected for attenuation.
the treatment should not be self-report but The reliabilities themselves need not be
should be independent of the subject, nor directly known, only the ratio of the relia-
should measurement occur at the posttest. bilities. If the null hypothesis is that the
Special effort should also be taken to insure treatment-effect correlations corrected for
that the measurement of subjects is similar attenuation are equal, that is,
both at the pretest and the posttest and for
both the treated and untreated subjects. For PXjT _

example, the motivational states and testing


situations of both groups should be identical.
Ideally the testing of both groups should be and if we know k, which is defined as the
in the same setting, one not connected with reliability ratio px^xjpx^^ it then directly
the program. follows that

Plausible Rival Hypotheses of Changes in the


Treatment-Effect Correlations Kenny (1973) has shown that the reliability
ratio can be estimated given the assumption
The comparison of the simple correlations of structural similarity of true scores over
or partial correlations over time has been time and given three or more repeatedly
suggested as a test of treatment effects. There measured dependent variables. Estimates of
are many plausible rival explanations of a reliability ratios are obtained by factoring a
difference between correlations that do not matrix of ratios of synchronous correlations.
involve treatment effect. Changes in distri- The resulting "factor loadings" are the relia-
butions of a variable affect the size of corre- bility ratios, or strictly speaking, communality
lations. Floor effects at the pretest or ceiling ratios. A difficulty remains because it is not
effects at the posttest loom as plausible rival immediately clear whether (a) the raw corre-
hypotheses. The distributions of the pre- and lations or (b) the correlations with the treat-
posttest should be examined and compared. ment variable partialed out should be used
To compare the pretest and posttest, the to estimate reliability ratios. If the raw
tests must be structurally similar. They need correlations are used, the assumption of
not be parallel in the strictest sense since it is stationarity is violated when the treatment
ordinarily expected that the mean and variance has an effect, but if the partials are used, the
of the test will change over time, but the tests reliability ratio may be biased since the
should have the same structural or causal variance shared with the treatment is ignored.
equation at both points in time. (Identical This difficulty can be avoided by obtaining
structural equations is what was meant by the reliability ratios from the partials and
stationarity in a previous section of this article.) then adjusting them for the increased relia-
Conceivably, the structural equations of the bility generated by the treatment variable
true scores could be invariant over time but (Kenny, Note 4).
error variance could vary over time. This
would create a strong plausible rival explana- Significance Testing
tion of a change in treatment-effect corre-
lation: a shift in the reliability of a measure The equality of the treatment-effect corre-
over time. For young children the reliability lations can be tested by a test credited to
of a test usually increases over time; failure Hotelling and reproduced by McNemar (1969,
to take this fact into account leads to a p. 158). Dunn and Clark (1969, 1971) have
spurious increase in the treatment-effect pointed out that Hotelling's test is a large-
correlation, the increase being due to an sample test and have evaluated a number of
increase in reliability and not a treatment additional tests. Possibly a more powerful test
effect. Special care must be taken to insure can be developed that takes into account
that the reliability of the test is stationary wi thin-group regression but allows for no
over time. If the reliabilities are not equal, between-group regression.
356 DAVID A. KENNY

Between- and Within-Group Regression The Evaluation of Compensatory Education


Programs
The logic of covariance analysis is that it
uses the within-group regression of individuals Compensatory education programs are those
toward the group mean to estimate the be- in which the treated subjects start out behind
tween-group regression of the group mean the controls in some cognitive ability. I make
toward the grand mean. Implicit in this logic the simplifying assumption that the relevant
is that both groups of subjects belong to the comparison group is the control group. Using
same population and that the forces that effect the treatment-effect correlation as a measure
changes in individuals also effect changes in of pretest differences, the pretest-treatment
groups (Bock, Note 5). However, if selection correlation starts out negative for compensa-
into treatment groups is only based on group tory evaluation. A totally successful compen-
membership and the groups differ equally on satory program should change the negative
the pretest and posttest, then there will be pretest-treatment correlation to a zero treat-
no between-group regression. ment-posttest correlation. A negative posttest-
Empirically, it is not an easy question to treatment correlation indicates a "semi-
decide whether there will be any between- compensatory" program, and a positive
group regression. As a rule, however, the treat- correlation indicates an "over-compensatory"
ment and control groups live in different program. A harmful program would give the
social, cognitive, and genetic environments, treatment-posttest correlation a higher nega-
which tends to maintain rather than diminish tive value than its baseline.
any pretest differences. Ordinarily the re- Campbell and Erlebacher (1970) suggest and
searcher should examine whether variables to some degree illustrate that compensatory
known to cause .the dependent variable education has been unfairly evaluated. They
claim that there is a bias in the evaluation of
partially explain the pretest difference. For
compensatory education that makes it look
educational data, for instance, if the lower harmful. This methodological bias can wash
scoring group is of lower socioeconomic status out the beneficial treatment effects of a
than the higher scoring group, then there is successful program.
evidence consistent with the different-popula- I agree with "Campbell and Erlebacher and
tion-variable hypothesis. Ideally these popu- believe that many compensatory programs
lation differences could be measured and should have been evaluated by the standard-
controlled. Unfortunately, it is usually im- ized change score analysis. Since the control
possible to specify and perfectly measure these group is often drawn from a radically different
variables. Even if they were measured, the environment than the treated group, selection
treatment variable is often so hopelessly is often based on intact group differences.
Let us examine in a hypothetical example the
confounded with them that it would be
direction of bias obtained by using the wrong
conceptually impossible to distinguish the treat- statistical technique when standardized change
ment variable and the background variable. score analysis should have been used.
In the case of selection occurring midway Selection is based on group differences :
between the pretest and posttest there may
be between-group regression, but the regression Xn = .SGi + .TZu + Vi26£lf
occurs both backward and forward in time.
The treated and control groups are maximally
different at the point of selection. Over time
T{ = - .4G<
these differences narrow, and if the rate of where pZlzt = .6. Given the above equations,
regression is stationary, the differences between the intercorrelations are
groups should be equal at the pretest and
PX^T = — .2
posttest. So even given between-group regres- :
sion, pretest and posttest differences could PX^T = — .2
still be equal. l>xx = -544.
NONEQUIVALENT CONTROL GROUP DESIGN 357

Note that the treatment-effect correlations education programs. Given this bias, it is
are negative, indicating a compensatory premature to conclude that "compensatory
program. For this example covariance analysis education has been tried and it apparently has
yields a negative estimate of treatment effect failed" (Jensen, 1969).
since For an empirical illustration of this bias,
< PXPXX
consider data taken from Crano, Kenny, and
Campbell (1972). A total of 5,495 Milwaukee
children were measured on 11 achievement
-.2 < -.1088. tests (Lindquist & Hieronymus, 1964) at the
fourth and sixth grades. The children can be
It is negative bias since covariance analysis divided into core or inner-city children and
predicts a smaller difference between the suburban children.
experimental and control group at the post- Let us imagine that the core-suburban
test. Covariance analysis underadjusts in this variable is perfectly confounded with some
case. The reliability of the pretest (.52 + .72) hypothetical treatment: All core children
is .74, making the baseline for covariance received the treatment and none of the
analysis with reliability correction surburban children received it. What would
PXST < analysis of covariance, raw change score
-.2< (-.2)(.544/.74) analysis, and standardized change score an-
alysis reveal about this pseudotreatment?
-.2 < -.147. Table 3 gives the actual and predicted posttest-
So covariance analysis either with or without treatment correlations of three different
reliability correction tends to falsely indicate methods.
a harmful effect of the compensatory treat- Working through an example, one sees that
ment. Using the reliability correction only rx%T for vocabulary is —.3485. The negative
partially adjusts for the bias of covariance sign indicates the core children scored lower
analysis since —.147 is less than —.1088. than the suburban children at the posttest.
Raw change score analysis also tends to be The pretest-treatment correlation is —.3412,
biased given the earlier mentioned fan spread indicating that it is a good baseline for the
hypothesis: It assumes that for growth data posttest-treatment correlation. Remember that
there should be increasing variability over a nonexistent treatment is being evaluated, so
time; for educational data, the common the predicted YX^T should equal the actual
finding is that the standard deviation increases rxfl. Analysis of covariance's baseline does
over time. Assuming the standard deviation not fare so well; the autocorrelation of
of Xi is 1 and the standard deviation of X2 is vocabulary is .752, making the predicted rx^r
2, the treatment effect estimate is also nega- (.752)(—.3412) = —.2566. Covariance thus
tive since predicts that the difference between the core
and suburban children should attenuate over
PX-tT < PXiTVxJvXz time. It presumes that pretest differences
-.2 < -.2/2 diminish since persons belong to the same
population and group means should regress
-.2< -.1. to a common mean. In this case, however, there
The above illustration shows that analysis is regression toward the mean within groups,
of covariance, analysis of covariance with but the group means do not regress to a
reliability correction, and raw change score common mean. There is within-group regres-
analysis all tend to be biased toward finding sion but no between-group regression.
negative effects of a compensatory treatment. The standard deviation of the vocabulary
Covariance analysis and raw change score test increases by a factor of 1.53, making the
analysis, though usually viewed as competitors predicted posttest-treatment correlation of
for this type of data, both yield the same raw change score analysis —.2225. Change
biased conclusion. The dice are loaded against score analysis, like analysis of covariance,
finding beneficial effects of compensatory forecasts a smaller posttest difference between
358 DAVID A. KENNY

TABLE 3
COMPARISON OP THREE METHODS IN PREDICTING THE TREATMENT-POSTTEST CORRELATION FOR MILWAUKEE DATA

Predicted rx&

Test Actual rx2T


Standardized change Analysis of Raw change
score analysis covariance score analysis
rxiT rx^xjZj rxiTSXi/sxi

Vocabulary -.3485 -.3412 -.2566 -.2225


Reading comprehension -.3293 -.3367 -.2434 -.2793
Spelling -.2085 -.2065 -.1514 -.1761
Capitalization -.3040 -.2882 -.1859 -.1594
Punctuation -.2771 -.2514 -.1412 -.1344
Verbal usage -.3004 -.3174 -.2244 -.1952
Map reading -.3512 -.2827 -.1444 -.2013
Graphs -.2718 -.3170 -.1680 -.2415
References -.3590 -.2478 -.1509 -.1900
Arithmetic concept -.3395 -.3447 -.2081 -.3664
Arithmetic problem solving -.3011 -.2637 -.1408 -.1982
M -.3082 -.2906 -.1832 -.2149

Note. N = 5, 495.

the core and suburban children than actually to be surprised that pretest differences do not
exists. It too leads to the erroneous conclusion diminish.
that the nonexistent treatment is harmful. For noncompensatory treatments, bias works
For all 11 variables covariance analysis in the opposite direction. Treatments given to
yields large negative effects of a nonexistent those who need them least are made to look
treatment. The average difference of the actual mistakenly beneficial (Campbell, Note 1). This
posttest-treatment correlation from the pre- can be seen in the above two examples. If we
dicted correlation for analysis of covariance change the signs of the treatment-effect corre-
is —.1250. Raw change score analysis also lations, the inequalities (and therefore direc-
shows large negative effects but with not tion of bias) reverse.
quite the consistency of covariance. Its
average difference is —.0933. Only the stan- CONCLUSION
dardized change score analysis generally shows
no difference. The slight difference between The example presented in Table 3 clearly
treatment-effect correlations of —.0176 can illustrates the utility of standardized change
be explained by the probable increased relia- score analysis. The criticisms of change score
bility of the Time 2 tests. Since there are no analysis by Cronbach and Furby (1970) and
reliability estimates for this data, covariance Werts and Linn (1970a) are widely cited, but
analysis with reliability correction was not these articles presume a model that implies
performed. either covariance analysis or more generally
It might be argued that both covariance and multiple regression analysis or covariance
raw change score analysis yield the correct analysis with reliability correction. I have
conclusion because they reveal an increased argued that these types of models are not
cognitive deficit for the' inner-city children. A valid if selection is based on group differences
counterargument is that given the non- or if selection occurs midway between the
equivalent control group design, the experi- pre- and posttest. I hope that my analysis
mental and control groups normally live in persuades researchers not to automatically
different worlds and therefore receive different use some form of regression analysis for the
treatments. It is thus more sensible to expect nonequivalent control group design. I also
that the different environments of the two hope that my reiteration of Campbell's point
groups preserve the pretest differences than about the selection-maturation interaction
NONEQV1VALENT CONTROL GROUP DESIGN 359

implied by the fan spread model persuades With regard to some topics, I have been too
researchers not to automatically use raw brief. I have not considered treatment-pretest
change score analysis, interactions (Goldberger, Note 6) or more
The main purpose of this article is not to generally the whole issue of linearity. I have
advocate standardized change score analysis deliberately (and also perhaps inadvertently)
but to advocate that the researcher carefully oversimplified many issues to ease and clarify
study the process of selection into treatments. their presentation. I have ignored any dis-
The critical questions are (a) With what and cussion of power (Goldberger, Note 7; Porter,
by how much does the treatment correlate Note 8) and have not discussed the issues of
with the causes of the dependent variable? estimation and significance testing enough.
(b) When does selection take place? The I have not examined the relevance of econo-
answer to these questions determines the mode metric literature. For instance, the method
of analysis. The standard practice in which of instrumental variables (Goldberger, 1964)
the mode of analysis determines the assump- could be applied in certain cases. Also, I have
tions about the data must be avoided. The not considered the case of testing the persist-
above questions, though they have statistical ence of treatment effects. Finally, I have not
and mathematical implications, are not sta- discussed either multilevel treatments or the
tistical questions that can be answered by an analysis of both manipulated and unmanipu-
examination of data. Their answers require lated treatment in pre-post true experimental
that the researcher be in close contact with designs.
the treatment program. The researcher should I wish to clear up one unintended emphasis
not take for granted statements by admini- of the article. For heuristic purposes the
strators about the process of selection but comparison of the treated and untreated
rather should use the same observational skills groups has been emphasized. Although such
as those an anthropologist or an ethnomethod- overall evaluations are important, they have
ologist uses to study the process of selection. been overemphasized. Since a program may
It may often be unclear which mode of only work in certain highly specific conditions,
analysis is appropriate, or it may be clear that such global comparisons often mask the
no mode of analysis is appropriate. In such interactive nature of treatment effects. Re-
cases one might perform multiple modes of searchers should examine not only the molar
analysis, each with a different bias (Cook, effects of the treatment but also the molecular-
1974) and check to see if the results converge. process effects within the treatment, so that
If they do, one can be confident of a result; the adaptive aspects of programs can be
if they do not, the correct inference is in doubt. selected and not "thrown out with the bath-
I do believe that the nonequivalent control water." An excellent illustration of within-
group design can be analyzed. I do not agree program evaluation is in Coulson (Note 9),
with Cronbach and Furby (1970), who state which examines the effect of variations within
for this design: "What cannot be done is to Head Start programs such as program orien-
compare 'treatment effects'" (p. 78). My tation, teacher-student ratio, teacher creden-
position is that the design can be analyzed tials, budget, and others.
conditional on a given structural model.7 The Researchers have a special responsibility in
structural model must be justified by the the design and evaluation of real-world
setting, population, measurement, selection, treatments. We not only have the usual
and design of the research. There will usually responsibility of scientists to find and report
be some equivocalities in the analysis, but the truth but also the responsibility of citizens
all inference, be it true experimental or quasi- to contribute to the public good. We should
experimental, is risky business. We cannot not routinely apply statistical methods that
expect certainty from data, only fallible were useful in the laboratory but should scrupu-
information. lously consider the biases of each method.
Some might argue that only true experi-
7
To some extent this approach resembles that of ments bring valid causal inference, and it is
Bock (Note 5). always logically possible to contrive alter-
360 DAVID A. KENNY

native explanations to any quasi-experimental 7. Goldberger, A. S. Selection bias in evaluating treat-


design. For instance, for the nonequivalent ment effects: The case of interaction. Madison: Insti-
tute for Research on Poverty, University of Wis-
control group design any set of results can be consin, 1972.
explained by the plausible rival hypotheses 8. Porter, A. C. Analysis strategies for some common
of the interactions of selection with history, evaluation paradigms. Office of Research Consul-
selection with maturation, and selection with tation, Michigan State University, 1973.
testing (see Cook & Campbell, 1975, for some 9. Coulson, J. E. Effects of different Head Start program
approaches on children of different characteristics:
of these explanations). But because the Report on analysis of data from 1968-1969 national
internal validity of quasi-experiments is lower evaluation. Office of Child Development, Department
than true experiments, it does not argue of Health, Education, and Welfare Contract HEW-
against using the judgments of quasi-experi- OS-70-168, 1972.
ments. We would all prefer to have the testi-
REFERENCES
mony about an event from a sighted man over
a blind man. But when we have only the blind Bereiter, C. Some persisting dilemmas in the measure-
man, we would not dismiss his testimony, ment of change. In C. W. Harris (Ed.), Problems in
measuring change. Madison: University of Wisconsin
especially if he were aware of his biases and had Press, 1963.
developed faculties of touch and hearing that Campbell, D. T. Factors relevant to the validity of
the sighted man could have developed but has experiments in social settings. Psychological Bulletin,
neglected. The difference between the true 1957 54,297-312.
experiment and the quasi-experiment is of the Campbell, D. T. Reforms as experiments. American
Psychologist, 1969, 24, 409-429.
magnitude of the difference between sight and Campbell, D. T., & Clayton, K. N. Avoiding regression
blindness. We must often grope in the darkness effects in panel studies of communication impact.
with quasi-experimental designs, but this Studies in Public Communication, Department of
blindness both forces us to compensate for Sociology, University of Chicago, No. 3, 1961
(Bobbs-Merrill Reprint S-353).
biases and helps us develop a newfound Campbell, D. T., & Erlebacher, A. How regression
sensitivity to the structure of data. Finally, artifacts in quasi-experimental evaluations can
it makes us appreciate the clarity of true mistakenly make compensatory eduction harmful. In
experimental inference. J. Hellmuth (Ed.), The disadvantaged child (Vol. 3).
New York: Brunner/Mazel, 1971.
Campbell, D. T., & Stanley, J. C., Experimental and
REFERENCE NOTES quasi-experimental designs for research on teaching.
1. Campbell, D. T. Temporal changes in treatment- In N. L. Gage (Ed.), Handbook of research on teaching.
effect correlations: A quasi-experimental model for Chicago: Rand McNally, 1963.
institutional records and longitudinal studies. In Chen, M. K. A critical look at the matching technique
G. V. Glass (Ed.), Proceedings of the 1970 Invita- in experimentation. The Journal of Experimental
tional Conference on Testing Problems. Princeton, Education, 1967, 35, 95-98.
N.J.: Educational Testing Service, 1971. Cohen, J. Multiple regression as a general data-analytic
system. Psychological Bulletin, 1968, 70, 426-443.
2. Campbell, D. T. The effects of college on students: Coleman, J. S. et al. Equality of educational opportunity.
Proposing a quasi-experimental approach. Research Washington, D. C.: Government Printing Office,
report, Northwestern University, 1967. 1966.
3. O'Conner, E. F. Unraveling Lord's paradox: The Cook, T. D. The potential and limitations of secondary
appropriate use of multiple regression analysis in evaluations. In M. Apple, H. J. Lufler, & B. M.
quasi-experimental research. Educational Testing Subkoviak (Eds.), Educational evaluation: Analysis
Service Research Bulletin RB-73-53, 1973. and responsibility. Berkeley, Calif.: McCutchan
4. Kenny, D. A. PANAL: Panel data analysis of the Press, 1974.
first year of "Sesame Street." Research report, Cook, T. D., & Campbell, D. T. The design and conduct
Harvard University, 1975. of quasi-experiments and true experiments in field
5. Bock, R. D. Conditional and unconditional inference settings. In M. D. Dunnette (Ed.), Handbook of
in the analysis of repeated measurements. Paper industrial and organizational research. Chicago: Rand
presented at the Symposium on the Application of McNally, 1975.
Statistical Techniques to Psychological Research, Cranb, W. B., Kenny, D. A., & Campbell, D. T. Does
Canadian Psychological Association, York Uni- intelligence cause achievement? A cross-lagged panel
versity, 1969. analysis. Journal of Educational Psychology, 1972, 63,
6. Goldberger, A. S. Selection bias in evaluating treat- 258-275.
ment effects: Some formal illustrations. Madison: Crider, D. M., Willits, F. K., & Bealer, R. C. Panel
Institute for Research on Poverty, University of studies: Some practical problems. Sociological
Wisconsin, 1972. Methods fir Research, 1973, 2, 3-19.
NONEQU1VALENT CONTROL GROUP DESIGN 361
Cronbach, L. E., & Furby, L. How we should measure Lord, F. M., & Novick, M. R. Statistical theories of
"change"—or should we? Psychological Bulletin, mental test scores. Reading, Mass.: Addison-Wesley,
1970, 74, 68-80. 1968.
Dunn, O. J., & Clark, V. Correlation coefficients McNemar, Q. Psychological statistics (4th ed.). New
measured on the same individuals. Journal of the York: Wiley, 1969.
American Statistical Association, 1969, 64, 366-377. Porter, A. C. The effects of using fallible variables in the
Dunn, O. J., & Clark, V. Tests of equality of dependent analysis of covariance. Unpublished doctoral disserta-
correlation coefficients. Journal of the American tion, University of Wisconsin, 1967.
Statistical Association, 1971, 66, 904-908. Rosenthal, R., & Rosnow, R. L. The volunteer subject.
Goldberger, A. S. Econometric theory. New York: In R. Rosenthal & R. L. Rosnow (Eds.), Artifact in
Wiley, 1964. behavioral research. New York: Academic Press,
Hilton, T. L., & Myers, A. E. Personal background, 1969.
experience, and school achievement: An investigation Rulon, P. J. Problems of regression. Harvard Educa-
of the contribution of questionnaire data to academic tional Review, 1941,11, 213-223.
prediction. Journal of Educational Measurement, Sween, J. A. The experimental regression design in
1967, 4, 69-80. inquiry into the feasibility of nonrandom treatment
Jensen, A. R. How much can we boost IQ and scholastic allocation. Unpublished doctoral dissertation, North-
achievement? Harvard Educational Review, 1969, 39, western University, 1971.
1-123.
Thistlethwaite, D. L., & Campbell, D. T. Regression-
Kenny, D. A. Cross-lagged and synchronous common
discontinuity analysis: An alternative to the ex post
factors in panel data. In A. S. Goldberger & O. D.
facto experiment. Journal of Educational Psychology,
Duncan (Eds.), Structural equation models in the
1969, 51, 309-317.
social sciences. New York: Seminar Press, 1973.
Kenny, D. A. A test for a vanishing tetrad: The second Weiss, C. H. Evaluation research. Englewood Cliffs,
canonical correlation equals zero. Social Science N.J.: Prentice-Hall, 1972.
Research, 1974, 3, 83-87. Werts, C. E., & Linn, R. L. A general linear model for
Lindquist, E. F., & Hieronymus, A. N. Iowa tests of studying growth. Psychological Bulletin, 1970, 73,
basic skills: Manual for administrators, supervisors, 17-22. (a)
and counselors. Boston: Houghton Mifflin, 1964. Werts, C. E., & Linn, R. L. Path analysis: Psycho-
Lord, F. M. Large-scale covariance analysis when the logical examples. Psychological Bulletin, 1970, 74,
control variable is fallible. Journal of the American 193-212. (b)
Statistical Association, 1960, 55, 307-321. Woodrow, H. The relation of verbal ability to improve-
Lord, F. M. A paradox in the interpretation of group ment with practice in verbal tests. Journal of Educa.-
comparison. Psychological Bulletin, 1967, 68,304-305. tional Psychology, 1939, 30, 179-186.

APPENDIX
To demonstrate the null hypothesis for each and subtracting, the result is
statistical technique in Table 1, it must be
shown that given a zero covariance between 0 = vx?C(T, X2) - C(T,
the treatment (T) with the dependent variable,
the null hypothesis follows. Letting C(X, Y) be Dividing through by <TT<rx^ffxf, one gets
the covariance of X and Y, we begin with
0 = C(T,
analysis of covariance :
- C(T,
0 = prx2 —
0 = C(T, X,) -
Some may recognize the above formula as the
Since numerator of the formula for pz2r.Xi and
PxiT.Xi, that is, the partial correlation and
C(Xlt X£ - C(Xi, T)C(X2, standardized partial regression coefficient of
Xt and T controlling for Xi.
it follows that The logic for analysis of covariance with
reliability correction is the same as above with
the inclusion of the reliability of the pretest
in the formula.
For raw change score analysis one begins
with
Multiplying through by 0 = C(r, X2 - Xi)
0 = c(r, *,) -C(T,XI).
362 DAVID A. KENNY

Multiplying through by f T) , the where * indicates zero mean and unit variance.
result is Multiplying through by I/or one gets
0 = C(r, 0 = ~C(T,
0 =
0 =
since X*i and X*i have unit variance. One
For standardized change score analysis should note that it need only be assumed that
0 = C(r, X*z - X*i) X*i and X*2 have equal variance.
= c(r, (Received April 22, 1974)

View publication stats

Você também pode gostar