Escolar Documentos
Profissional Documentos
Cultura Documentos
A Quasi-Experimental Approach to
Assessing Treatment Effects in
Nonequivalent Control Group
Designs
CITATIONS READS
123 2,913
1 author:
David A. Kenny
University of Connecticut
205 PUBLICATIONS 93,152 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by David A. Kenny on 20 May 2014.
=vs N - 2'
where N is the total number of subjects. The
the posttest is regressed on the pretest and the
residual from the regression is taken to be the
dependent variable. Some authors (e.g., Lord,
1960; Porter, 1967) have suggested that the
formula_for the difference between treated
estimated regression coefficient is attenuated
mean (Xi) and control mean (Xc) to r is
by measurement error in the pretest, so the
regression coefficient must be corrected for
attenuation. The appropriate correction is to
divide the regression coefficient of posttest
where s% is the variability of the observations, on pretest by the reliability of the pretest. A
P-s is the proportion of the total sample in the third approach is raw change score analysis.
treated group, and Pc is for the control group Although change score analysis.;is often con-
(PT + PC = 1). The treatment-effect corre- demned (e.g., Cronbach & Furby, 1970; Werts
lation is, then, closely related to t and the & Linn, 1970a), it is perhaps the most common
difference between means, but like all measures way of analyzing this design. As the name
NONEQVIVALENT CONTROL GROUP DESIGN 347
TABLE 1
NULL HYPOTHESES FOR THE FOUR MODES OF ANALYSIS OF THE NONEQUIVALENT CONTROL GROUP DESIGN
Note: 621 • T is the unstandardized partial regression of X 2 on Xi, controlling for T; * indicates standardization of variable; Xi is
the pretest, Xz the posttest, and T the treatment variable.
suggests, the dependent variable is the posttest the correction factor is the pretest-posttest
minus the pretest. The popularity of this mode correlation of X. Since the pretest-posttest
of analysis is no doubt due to its seeming ease correlation is ordinarily less than one, analysis
of interpretation and the fact that it can be of covariance implies that in the absence of
viewed as part of a repeated measures analysis treatment effects, theposttest-treatment corre-
of variance. Raw change score analysis is lation will be less in absolute value than the
equivalent to the Time X Treatment inter- pretest-treatment correlation. The correction
action of a repeated measures analysis. It factor for the analysis of covariance with
should be noted here that using the change score reliability correction is the pretest-posttest
as the dependent variable and the pretest as a correlation of X divided by the reliability of
covariate yields significant test results that are the pretest. This correction factor is ordinarily
identical to the analysis of covariance with a less than one,2 making posttest-treatment less
metric of units of change (Werts & Linn, in absolute value than the pretest-treatment
1970a). A fourth mode of analysis is standard- correlation. The correction factor for raw
ized change score analysis. The pretest and change score analysis is the standard deviation
posttest are separately standardized (given of the pretest divided by the standard devia-
unit variance and zero mean) and then tion of the posttest. Standardized change
differenced. (Note that this technique does not score analysis has no correction factor, or more
standardize within treatment groups.) This precisely, has a correction factor of one: The
method is identical to raw change score two treatment-effect correlations should be
analysis with the exception that the variance equal.
of the dependent variable is made stationary Campbell and Erlebacher (1971) have
over time. suggested a fifth method if the pretest and
There is for each mode of analysis a sta- posttest are "factorially similar": covariance
tistical expression that equals zero if there analysis with common factor coefficient correc-
were no treatment effect, that is, the null tion. This method resembles analysis of co-
hypothesis of each method. As stated earlier, variance with reliability correction, the differ-
the orientation of this article is quasi-experi- ence being that the regression coefficient is
mental ; it seeks to determine the cases where a divided by the pretest-posttest correlation of
statistical technique correctly infers no effect. X instead of the reliability of Xi. This correc-
The differences among the four modes of tion yields the same null hypothesis as the
analysis can be brought into focus by express- null hypothesis for standardized change score
ing their null hypotheses as treatment-effect analysis, since the pretest-posttest correlations
correlations as in Table 1. Designations are cancel each other out.
as follows: pretest = Xi, posttest = Xz, and
treatment variable = T. 1
Given the standard formula for correction for
All four null hypotheses can be viewed as attenuation
setting the post-test-treatment correlation
equal to the pretest-treatment correlation
times a correction factor. (See Appendix for it directly follows that
derivations.) For the analysis of covariance x^Xi < 1 if
348 DAVID A. KENNY
It should be clear from Table 1 that the All variables (both measured and un-
null hypotheses of the four modes of analysis measured) are standardized because standard-
will rarely all be equal to zero except in highly ization decreases algebraic complexity and
trivial cases. Each of the four modes of analysis because in this case it means little loss of
has been advocated as the method of analysis generality. For instance, the null hypothesis for
for the nonequivalent control group design by analysis of covariance can be stated in terms of
various authors. Other authors (e.g., Lord, correlations. Raw change score analysis, how-
1967) have pointed out that it is paradoxical ever, necessitates the original metric.
that different methods yield different con- The equations for Xi and X2 can be ex-
clusions. Cronbach and Furby (1970) state pressed in terms of their causes G, Z, and E:
that treatments cannot be compared for this
design. The literature on this design is, there- Xu = aid + hZu + dEn (1)
fore, very confusing and not at all instructive Xu = a£!i + 62Z2i + e2E2i (2)
to the practitioner.
The validity of any mode of analysis where the subscripts 1 and 2 refer to time and
depends on its match with the process of subscript i to subject. Since no treatment
selection into groups. In the remaining part of effects are assumed, T is not written into the
this article a general model of selection is causal function of Xi or X2. The group differ-
elaborated and various special cases of that ences variable (G) is assumed to be perfectly
model are considered. Each mode of analysis stable, making its autocorrelation unity. This
is appropriate for a given model of selection. explains why G needs no time subscript.
Though not advocated as the mode of analysis, Relative position within groups (Z) may not
standardized change score analysis is empha- be perfectly stable, making its autocorrelation
sized. It has been previously discussed by less than one; the errors of measurement vari-
Woodrow (1939) and by Bereiter (1963) as a able (E) is perfectly unstable, making its
test of treatment effects, but both authors autocorrelation zero. All unmeasured variables
recommended some form of regression analysis. are assumed to be uncorrelated with each
Campbell (Campbell & Clayton, 1961; Camp- other3 with the previously stated exception
bell, Note 1) revived the method but gave that Z\ and Z2 may be correlated (pz^z^ =•])•
only an intuitive justification. Standardized If the treatment is correlated with the
change score analysis should be seriously con- pretest, it must be confounded with the causes
sidered as an alternative in the analysis of the of the pretest. Thus, the treatment must be
nonequivalent control group design. correlated with either group differences (G),
relative position within groups (Z), errors of
measurement (E), or any combination of the
THE GENERAL MODEL three. So in writing the causal function for
the treatment variable, G, Z, and E must be
There are three measured variables: X\ — a included. At the beginning the assumption is
pretest measure of the variable the treatment made that the occasion of selection of the
is to change, X2 = a posttest measure, and persons into treatment occurs at the pretest,
T = the treatment variable. thus making T confounded with Z and E at
The causal function of X is divided into Time 1. (This assumption will later change.)
three unmeasured, latent components: G The causal function of the treatment variable
= group difference such as sex, race, classroom, is thus
and others; Z = individual differences within
groups; and E = totally unstable causes of X Ti = qd + mZu + sEu + fU<, (3)
(errors of measurement). G must be included where U is a residual term that is uncorrelated
in the specification of causes of X because in with all unmeasured variables. U is simply a
field settings it can rarely be assumed that
sampling is done from a single homogeneous 3
The latent variables may be allowed to be correlated
population. Sampling is usually done from without altering many of the conclusions. I have
multiple populations, each of which has a chosen to make the variables uncorrelated to simplify
different mean level of X, presentation.
NONEQUIVALENT CONTROL GROUP DESIGN 349
and both groups regress toward the grand into treatment groups is made on the basis of
mean. the true pretest, aiG + biZi, and not on errors
The dashed line in Figure 2 indicates the of measurement or any other function of G
expectation of group differences at the posttest and Z. In the case of selection based on the
given no regression toward the mean. This measured pretest a similar hypothesis was
would be the baseline expected by change score made, but that hypothesis was justified by the
analysis or equivalently by the lack of a design of the research. In this case Equation 8
Treatment X Time interaction in a repeated is an assumption that must be justified by
measures analysis of variance. Change score evidence from the selection process itself.
analysis fails to take into account the regression If s = 0 and Equation 8 holds, Correlations
toward the mean that should be expected. 4 and 5 become
Covariance analysis is also generally more
powerful than change score analysis.
(9)
The control of assignment to groups either
by randomization or assignment based on
some other measured criterion necessitates The reliability of the pretest px^ is defined
analysis of covariance. Once again, special care as oi2 + bf, making PX:T = kpxlXl, and given
should be taken to meet the assumptions of Equation 6, px^r — kpx^x^- The analysis of
analysis of covariance, especially those of covariance null hypothesis will not equal zero
linearity and homogeneity of regression. since
Lord (1960), consider, for example, a parallel 3. To receive a selective treatment a person
measure of Xi, say F. Let or that person's sponsor must be highly
motivated or have political connections and
Yi = a3G; + btZii + e3Esi, organizational "savvy." These volunteers differ
where £3 is uncorrelated with all other un- systematically from nonvolunteers on a num-
measured variables. If we assume that Equa- ber of characteristics (Rosenthal & Rosnow,
tion 8 holds and that s = Q, then it follows that 1969).
4. The treatment either is a sociological or
PXlY — WJ.W3 , viva demographic variable or hopelessly confounded
with one. Examples of this are a study on the
effects of dropping out of high school or testing
PTY = k(aias+ 6163). (11) for differences in socialization between males
Given Equations 9, 10, and 11, it follows that and females. The reader can probably also
conceive other patterns of sociological selec-
tion. Suffice it to say that it is a rather common
form of selection into social programs.
Substituting the above formula for reliability In terms of the general model, selection
into the reliability correction formula in Table based on group differences implies that
1 yields the following null hypothesis : j = m = 0. To gain an overidentifying restric-
PXfl =
tion one must assume some form of station-
arity, that is, that the effect of group differences
or equivalently in vanishing tetrad 6 form : is the same at the pre- and posttest. Campbell
(Note 2) has argued for just such a model
= 0.
with what he called the fan spread hypothesis.
The vanishing tetrad can be tested by the null The fan spread hypothesis can be illustrated
hypothesis of a zero second canonical correla- pictorially. In Figure 3 two groups start out
tion between variables Xi and T and variables at Time 1 with divergent means. Campbell
X, and F (Kenny, 1974). (1969) suggests that associated with this mean
difference is a difference in maturation; those
Selection Based on Croup Differences with the higher mean mature at a greater
rate than those with the lower mean. Campbell
For most social programs assignment to the calls this the "interaction of selection and
treatment is not based on some psychological maturation" and has used this interaction as
individual difference, that is, true score, but an argument against raw change score analysis.
on some sociological, demographic, or social Since the mean difference between groups is
psychological characteristic. This sociological widening over time, change score analysis only
selection is brought about in a variety of ways : indicates the more rapid rate of maturation of
1. It may be a matter of policy or legislation the initially higher group. The fan spread
that treatment is available to a particular hypothesis is that increasing variability within
social group, for example, persons living in groups accompanies increasing mean differ-
particular census tracts. It may be virtually ences. In its strictest form the fan spread
impossible to find members of that social group hypothesis is that the difference between group
who did not receive the treatment. means relative to the pooled standard deviation
2. Treatments are administered to members within groups is constant over time. In Figure 3
of an entire organization (e.g., school system), the ratio of mean difference to standard
and members of the treated organization must deviation is always 4:1.
be compared with another organization. The rationale for the fan spread hypothesis
is that the different groups are members ©f
6
A vanishing tetrad is a general null hypothesis in different populations living in different en-
factor analysis of the form vironments. The different environments create
P12P34 — PlSPU — 0, and maintain different levels of performance
where 1, 2, 3, and 4 are variables. and different rates of growth. Given that
NONEQVIVALENT CONTROL GROUP DESIGN 353
relevant factors are uncorrelated with the the treatment-effect correlations should be
treatment, the method of the measurement of corrected for attenuation.
the treatment should not be self-report but The reliabilities themselves need not be
should be independent of the subject, nor directly known, only the ratio of the relia-
should measurement occur at the posttest. bilities. If the null hypothesis is that the
Special effort should also be taken to insure treatment-effect correlations corrected for
that the measurement of subjects is similar attenuation are equal, that is,
both at the pretest and the posttest and for
both the treated and untreated subjects. For PXjT _
Note that the treatment-effect correlations education programs. Given this bias, it is
are negative, indicating a compensatory premature to conclude that "compensatory
program. For this example covariance analysis education has been tried and it apparently has
yields a negative estimate of treatment effect failed" (Jensen, 1969).
since For an empirical illustration of this bias,
< PXPXX
consider data taken from Crano, Kenny, and
Campbell (1972). A total of 5,495 Milwaukee
children were measured on 11 achievement
-.2 < -.1088. tests (Lindquist & Hieronymus, 1964) at the
fourth and sixth grades. The children can be
It is negative bias since covariance analysis divided into core or inner-city children and
predicts a smaller difference between the suburban children.
experimental and control group at the post- Let us imagine that the core-suburban
test. Covariance analysis underadjusts in this variable is perfectly confounded with some
case. The reliability of the pretest (.52 + .72) hypothetical treatment: All core children
is .74, making the baseline for covariance received the treatment and none of the
analysis with reliability correction surburban children received it. What would
PXST < analysis of covariance, raw change score
-.2< (-.2)(.544/.74) analysis, and standardized change score an-
alysis reveal about this pseudotreatment?
-.2 < -.147. Table 3 gives the actual and predicted posttest-
So covariance analysis either with or without treatment correlations of three different
reliability correction tends to falsely indicate methods.
a harmful effect of the compensatory treat- Working through an example, one sees that
ment. Using the reliability correction only rx%T for vocabulary is —.3485. The negative
partially adjusts for the bias of covariance sign indicates the core children scored lower
analysis since —.147 is less than —.1088. than the suburban children at the posttest.
Raw change score analysis also tends to be The pretest-treatment correlation is —.3412,
biased given the earlier mentioned fan spread indicating that it is a good baseline for the
hypothesis: It assumes that for growth data posttest-treatment correlation. Remember that
there should be increasing variability over a nonexistent treatment is being evaluated, so
time; for educational data, the common the predicted YX^T should equal the actual
finding is that the standard deviation increases rxfl. Analysis of covariance's baseline does
over time. Assuming the standard deviation not fare so well; the autocorrelation of
of Xi is 1 and the standard deviation of X2 is vocabulary is .752, making the predicted rx^r
2, the treatment effect estimate is also nega- (.752)(—.3412) = —.2566. Covariance thus
tive since predicts that the difference between the core
and suburban children should attenuate over
PX-tT < PXiTVxJvXz time. It presumes that pretest differences
-.2 < -.2/2 diminish since persons belong to the same
population and group means should regress
-.2< -.1. to a common mean. In this case, however, there
The above illustration shows that analysis is regression toward the mean within groups,
of covariance, analysis of covariance with but the group means do not regress to a
reliability correction, and raw change score common mean. There is within-group regres-
analysis all tend to be biased toward finding sion but no between-group regression.
negative effects of a compensatory treatment. The standard deviation of the vocabulary
Covariance analysis and raw change score test increases by a factor of 1.53, making the
analysis, though usually viewed as competitors predicted posttest-treatment correlation of
for this type of data, both yield the same raw change score analysis —.2225. Change
biased conclusion. The dice are loaded against score analysis, like analysis of covariance,
finding beneficial effects of compensatory forecasts a smaller posttest difference between
358 DAVID A. KENNY
TABLE 3
COMPARISON OP THREE METHODS IN PREDICTING THE TREATMENT-POSTTEST CORRELATION FOR MILWAUKEE DATA
Predicted rx&
Note. N = 5, 495.
the core and suburban children than actually to be surprised that pretest differences do not
exists. It too leads to the erroneous conclusion diminish.
that the nonexistent treatment is harmful. For noncompensatory treatments, bias works
For all 11 variables covariance analysis in the opposite direction. Treatments given to
yields large negative effects of a nonexistent those who need them least are made to look
treatment. The average difference of the actual mistakenly beneficial (Campbell, Note 1). This
posttest-treatment correlation from the pre- can be seen in the above two examples. If we
dicted correlation for analysis of covariance change the signs of the treatment-effect corre-
is —.1250. Raw change score analysis also lations, the inequalities (and therefore direc-
shows large negative effects but with not tion of bias) reverse.
quite the consistency of covariance. Its
average difference is —.0933. Only the stan- CONCLUSION
dardized change score analysis generally shows
no difference. The slight difference between The example presented in Table 3 clearly
treatment-effect correlations of —.0176 can illustrates the utility of standardized change
be explained by the probable increased relia- score analysis. The criticisms of change score
bility of the Time 2 tests. Since there are no analysis by Cronbach and Furby (1970) and
reliability estimates for this data, covariance Werts and Linn (1970a) are widely cited, but
analysis with reliability correction was not these articles presume a model that implies
performed. either covariance analysis or more generally
It might be argued that both covariance and multiple regression analysis or covariance
raw change score analysis yield the correct analysis with reliability correction. I have
conclusion because they reveal an increased argued that these types of models are not
cognitive deficit for the' inner-city children. A valid if selection is based on group differences
counterargument is that given the non- or if selection occurs midway between the
equivalent control group design, the experi- pre- and posttest. I hope that my analysis
mental and control groups normally live in persuades researchers not to automatically
different worlds and therefore receive different use some form of regression analysis for the
treatments. It is thus more sensible to expect nonequivalent control group design. I also
that the different environments of the two hope that my reiteration of Campbell's point
groups preserve the pretest differences than about the selection-maturation interaction
NONEQV1VALENT CONTROL GROUP DESIGN 359
implied by the fan spread model persuades With regard to some topics, I have been too
researchers not to automatically use raw brief. I have not considered treatment-pretest
change score analysis, interactions (Goldberger, Note 6) or more
The main purpose of this article is not to generally the whole issue of linearity. I have
advocate standardized change score analysis deliberately (and also perhaps inadvertently)
but to advocate that the researcher carefully oversimplified many issues to ease and clarify
study the process of selection into treatments. their presentation. I have ignored any dis-
The critical questions are (a) With what and cussion of power (Goldberger, Note 7; Porter,
by how much does the treatment correlate Note 8) and have not discussed the issues of
with the causes of the dependent variable? estimation and significance testing enough.
(b) When does selection take place? The I have not examined the relevance of econo-
answer to these questions determines the mode metric literature. For instance, the method
of analysis. The standard practice in which of instrumental variables (Goldberger, 1964)
the mode of analysis determines the assump- could be applied in certain cases. Also, I have
tions about the data must be avoided. The not considered the case of testing the persist-
above questions, though they have statistical ence of treatment effects. Finally, I have not
and mathematical implications, are not sta- discussed either multilevel treatments or the
tistical questions that can be answered by an analysis of both manipulated and unmanipu-
examination of data. Their answers require lated treatment in pre-post true experimental
that the researcher be in close contact with designs.
the treatment program. The researcher should I wish to clear up one unintended emphasis
not take for granted statements by admini- of the article. For heuristic purposes the
strators about the process of selection but comparison of the treated and untreated
rather should use the same observational skills groups has been emphasized. Although such
as those an anthropologist or an ethnomethod- overall evaluations are important, they have
ologist uses to study the process of selection. been overemphasized. Since a program may
It may often be unclear which mode of only work in certain highly specific conditions,
analysis is appropriate, or it may be clear that such global comparisons often mask the
no mode of analysis is appropriate. In such interactive nature of treatment effects. Re-
cases one might perform multiple modes of searchers should examine not only the molar
analysis, each with a different bias (Cook, effects of the treatment but also the molecular-
1974) and check to see if the results converge. process effects within the treatment, so that
If they do, one can be confident of a result; the adaptive aspects of programs can be
if they do not, the correct inference is in doubt. selected and not "thrown out with the bath-
I do believe that the nonequivalent control water." An excellent illustration of within-
group design can be analyzed. I do not agree program evaluation is in Coulson (Note 9),
with Cronbach and Furby (1970), who state which examines the effect of variations within
for this design: "What cannot be done is to Head Start programs such as program orien-
compare 'treatment effects'" (p. 78). My tation, teacher-student ratio, teacher creden-
position is that the design can be analyzed tials, budget, and others.
conditional on a given structural model.7 The Researchers have a special responsibility in
structural model must be justified by the the design and evaluation of real-world
setting, population, measurement, selection, treatments. We not only have the usual
and design of the research. There will usually responsibility of scientists to find and report
be some equivocalities in the analysis, but the truth but also the responsibility of citizens
all inference, be it true experimental or quasi- to contribute to the public good. We should
experimental, is risky business. We cannot not routinely apply statistical methods that
expect certainty from data, only fallible were useful in the laboratory but should scrupu-
information. lously consider the biases of each method.
Some might argue that only true experi-
7
To some extent this approach resembles that of ments bring valid causal inference, and it is
Bock (Note 5). always logically possible to contrive alter-
360 DAVID A. KENNY
APPENDIX
To demonstrate the null hypothesis for each and subtracting, the result is
statistical technique in Table 1, it must be
shown that given a zero covariance between 0 = vx?C(T, X2) - C(T,
the treatment (T) with the dependent variable,
the null hypothesis follows. Letting C(X, Y) be Dividing through by <TT<rx^ffxf, one gets
the covariance of X and Y, we begin with
0 = C(T,
analysis of covariance :
- C(T,
0 = prx2 —
0 = C(T, X,) -
Some may recognize the above formula as the
Since numerator of the formula for pz2r.Xi and
PxiT.Xi, that is, the partial correlation and
C(Xlt X£ - C(Xi, T)C(X2, standardized partial regression coefficient of
Xt and T controlling for Xi.
it follows that The logic for analysis of covariance with
reliability correction is the same as above with
the inclusion of the reliability of the pretest
in the formula.
For raw change score analysis one begins
with
Multiplying through by 0 = C(r, X2 - Xi)
0 = c(r, *,) -C(T,XI).
362 DAVID A. KENNY
Multiplying through by f T) , the where * indicates zero mean and unit variance.
result is Multiplying through by I/or one gets
0 = C(r, 0 = ~C(T,
0 =
0 =
since X*i and X*i have unit variance. One
For standardized change score analysis should note that it need only be assumed that
0 = C(r, X*z - X*i) X*i and X*2 have equal variance.
= c(r, (Received April 22, 1974)