Escolar Documentos
Profissional Documentos
Cultura Documentos
27, 229-2391
229
A GENERAL METHOD FOR STUDYING DIFFERENCES I N FACTOR MEANS AND FACTOR STRUCTURE BETWEEN GROUPS By DAG SORBOM
University of Uppsala
A statistical model is developed for the study of similarities and differences in factor structure between several groups. The model assumes that the observed variables satisfy a factor analysis model in each group. A method of data analysis is presented which, in contrast to earlier work, makes use of information in the observed means as well as the observed variances and covariances to estimate the parameters in each group, i.e. factor means, factor loadings, factor variances and covariances and unique variances. Usually the units of measurement in the observed variables have no intrinsic meaning and therefore it is only meaningful to compare the relative magnitudes of the parameters for the different groups. The method estimates the parameters for all groups simultaneously and can take into account a priori information about factorial invariance of various degrees.
1. INTRODUCTION The model described by Joreskog (1971) is concerned with the study of similarities and differences in factor structure between different groups. The aim of this paper is to discuss a model which, in addition, allows one to study the means of the factors for the groups. Usually these means are estimated in a manner similar to the estimation of factor scores (Lawley & Maxwell, 1971), i.e. the means are obtained from the estimated parameters of the model as if these were equal to the population parameters. For instance, consider the model in classical factor analysis x = AE+e.
If we denote E(E) by 8 it follows that E ( x ) = do. An estimate of the factor means may be obtained by minimizing (cf. Lawley & Maxwell, 1971; McGaw & Joreskog, 1971):
( j z - lie)2-l(i -
Ae),
where i is the vector of sample means of the manifest variables, is the matrix of estimated factor loadings, and 2 is the population variance-covariance matrix estimated from the factor analysis model. Although this procedure gives the maximum-likelihood (ML) estimator of 8 given C, it is not the M L estimator of the parameter vector 8 in the model, since, as will be shown in Section 3, information of A is contained also in Z . I n a recent paper, Please (1973) gives a model very similar to the one considered in this paper and develops a procedure for obtaining M L estimators. However, his estimation procedure cannot take into account a priori information about the
15
230
SORBOM
measurements, such as zero factor loadings in specified positions, which is a feature of some importance for the application of the model (see, e.g., Joreskog, 1971; Sorbom, 19733). In addition, the procedure described in this paper is more general than that of Please, which will be shown to be a special case.
2. THE MODEL For m groups consider the following model for an observable vector x , of order p for the gth group,
x , = p + A , g , + ~ , ( g = 1,2, ...,m), (1) where p is a constant vector of order p representing the origin of the measurements, l&, is a random vector of order k representing the common factors, A, is a p x k matrix of factor loadings and E, is a random vector of order p of unique factors. By the usual factor analytic assumptions we obtain the variance-covariance matrix of the measurements for the gth group as
C, = ~ [ ( x , p - A, 8,)
(x, - p - A, e,)q
= A, ip, A,,
(2)
where 8, and ip, are the vector of means and the variance-covariance matrix of 5, respectively, and Yga is a diagonal matrix consisting of the variances of the unique factors. The expected values of the measurements are given by (3) T o make the interpretation of the parameters in the model easier we may distinguish between
~(x,= ) p+~,e,.
1. Parameters diferentiating the measurements. The vector p. indicates the level of the measurements, i.e. given l&, = 0, the expected values of the measurements equal p. The ith row of A , gives the regression of x, on the common factors and the ith diagonal element of Yg2 is the corresponding residual variance. 2. Parameters describing the factor space. The matrix ip, is the variancecovariance matrix of the common factors and 8, is their mean vector.
3. ESTIMATION
Model (1) is not identified. We can premultiply ip, in (1) by a non-singular matrix of order k x k, A,, and postmultiply A , by A,-l. This does not change the observable measurements x , and hence a fortiori their mean vector E(x,) and their variance-covariance matrix C,. Thus we have to introduce at least k2 restrictions on A , , 0, and 8, ( g = 1,2, ...,m), in such a way that the only A, matrices that maintain the restrictions imposed are identity matrices. This can be done in several ways but, as indicated by Joreskog (1971), it is hard to give further specific rules in the general case. The main purpose of the model just described is to study differences in the factor spaces of the measurements arising from different groups of observations. Suppose, for example, that we want to study differences in abilities for groups
1974.27, 229-2391
231
of individuals belonging to different socio-economic classes. T o do this we administer the same battery of tests to all individuals. This means that the connexion between the factor space and the space of the measurements is the same for all groups. Thus we should have
A, = A 2 = ... = A m= A.
(4)
Furthermore, each measurement is often constructed in such a way that it measures only a subclass of the abilities to be studied. This means that we have a priori information of the kind that b5= 0, i.e. the j t h ability does not affect the ith measurement. I n Section 6 two examples are given, which show how a priori information of this kind may be used to make Model (1) identifiable. T o obtain M L estimates of the parameters, a computer program (Sorbom, 1973a) has been developed, which allows one to utilize information of the kind just discussed. Because the procedure is similar to that of Joreskog (1971), only a brief account of it is given here. If it can be assumed that the measurements obtained in the different groups are independent, the M L estimates can be calculated by minimizing the function
where N, is the number of observations in the gth group and f , is minus the log likelihood function for the measurements in the gth group, i.e.
f,
where
a=l
= log
IZ ,I+ tr(Z,-l T , ) ,
(,,(a)
T , = (1IN,)c. ( ~ , ( a) p - A, e, )
and
n;,
- - A8 e, 1' '
ath observation in the gth group. The minimization procedure is a modification of the Fletcher & Powell (1963) method as described by Gruvaeus & Joreskog (1970). This method makes use of the first derivatives of F to find the minimum by an iterative process. These first derivatives are almost the same as those given by Joreskog (1971). The main , and O,, which are given by differences are the derivatives for p, A
aF/@
=
,=l
(6)
aF/aA, = N,&-l(Z,
and
-T , ) Z,-l A,
a ,- Z,-l($
- p -A, 0,)
Oil,
(7)
(8)
- q).
232
SQRBOM
Eqn. (6) can be considerably simplified by the use of the identity c,-l = y - 2 - y -2A (A '\y -2A + C p -1)-1A 'y -2
0 0
0 0 ,
A, = Y,-'A,,
to give
The estimates of 8, obtained by solving aF/a8, = 0 are the same as will be obtained by minimizing q , ( j i , - p.- A , e,)' x,-yz, - p.-A, 8,). On the other hand, the estimates of A, obtained by solving BF/aA, = 0 are not the same as those obtained in classical factor analysis, where
(9) The matrix S , in (9) is the sample variance-covariance matrix. This means that the two-stage procedure mentioned in Section 1 does not produce ML estimates in the sense of Model (1). When minimizing F, a priori information of the kind mentioned above can be handled by the following partition of the parameters in the model: (1) Free parameters, viz. independent parameters which are allowed to vary without any restrictions; (2) fixed parameters, viz. parameters whose values are known a priori; (3) constrained parameters, viz. parameters each of which is constrained to equal a free parameter. Thus measurements obtained in accordance with (4) may be assumed to be structured by a partition of the parameters in the following manner : 1. Free parameters: p . , Cp,, Y,, 8, (g = 1,2, ..., m). Also, if it is assumed that thejth ability is involved in the ith test, Aij is free. 2. Fixed parameters: If it is assumed that the Zth ability is not involved in the kth test, A, is zero. 3 . Constrained parameters: As the same tests have been administered to all groups we let the non-fixed parameters in A for the first group be free and let the corresponding parameters in A,, A,, ...,A, equal these free parameters. This method of specifying the parameters makes it possible to generate a wide variety of models. It should be noted that it is not necessary that the number of factors is the same in all populations. If, for example, the ith population has an extra factor, we can introduce an artificial factor for the other populations by specifying zeros in a column of A,, a row and column in 0, and a zero in O,, for g = 1,2, ...,i- l,i+ 1, ...,m. This procedure makes it possible to treat the 'basic factor model' studied by Please (1973).
4. TEST OF HYPOTHESIS The value of F at the minimum, P , may be used in large samples to test hypotheses in the likelihood-ratio sense.
ao.
1974.27, 229-2391
233
If we let
it can be shown that 2G is distributed asymptotically as x2 with degrees of freedom equal to d.f. = p + p ( p + 1)/2-q, where q is the number of free parameters in the model estimated. This test is a test of the hypothesis that the measurements are structured in a manner defined by the model and the restrictions of fixed, free and constrained parameters as described in Section 3, as against the hypothesis that the measurements are not structured, i.e. that the mean vector E(x,) and the variance-covariance matrix C, are unconstrained in all groups. The value of G may also be used to test more specific hypotheses about the parameters. Suppose that G, is the value obtained for one model and that G, is the value obtained for a more restricted model. Then
x2 = 2(G, - GI)
is distributed as x2 with degrees of freedom equal to the difference in degrees of freedom for the two models. It should be noted, however, that if the same data are used to estimate the two models, the test is not a test of hypothesis in the common statistical sense. The x2 test should only be regarded as a tool to generate hypotheses in exploratory studies. These hypotheses must later be tested in a confirmatory study based on new data. Two very simple examples may elucidate this. Suppose an analysis has given a value of, say, Xij = 1.3 with G = G,. We do the analysis once again, but this time with Xij fixed to be equal to 1.3. We obtain this second time a value of G, G,, which of course is equal to G,. Thus we obtain x2 = 0 with one degree of freedom, but this should not lead us to propose that it is confirmed that hij = 1.3. Similarly, if we obtained two X values, say, Aij = 1-3 and AM = 1.4, we could do the analysis again with the restriction that AM = Xi*. Again, a low value of x2 does not indicate a confirmation of the hypothesis that A, equals . , A Rather, it gives us a hint that in the metric used, the differences between the As is not very large. Only a confirmatory study of a new independent set of data can give support to the hypothesis.
CASE
an indeterminacy of the parameters p and 8, appears. The observable means of the groups, E(x,) (g = 1,2, ..., m),are structured as
~ ( x ,= ) p++ne,. Thus we can add a k-dimensional vector, a,to the 8s in the groups and subtract Aa from p to obtain: E(x,) = p - Aa +A(@,+ a)= p + A8,.
234
SGRBOM
Thus the estimation of p. and the 8 vectors can be obtained only if we introduce at least k constraints. I n the estimation procedure this can be done by letting, for example, 8, = 0. Afterwards the estimates, 6u and @, could be rescaled to , . and p*, such that obtain new estimates, 8
1-1
5 Nu8,.
= 0,
that is,
6. Two ILLUSTRATIONS
Meredith (1964) and Joreskog (1971) used data from Holzinger & Swineford (1939) to illustrate differences in factor spaces among four groups of individuals. The data were obtained from the same tests that had been administrated to all groups, and nine tests defining a three-dimensional factor space were selected. The groups consisted of 7th- and 9th-grade children from two schools in Chicago, the Pasteur and the Grant-White schools. The children from each school were divided into two groups according to whether they scored above or below the median on a speeded addition test. Thus the following four groups were analysed: (1) Pasteur, Low (N, = 77), (2) Pasteur, High (Na = 79), (3) Grant-White, Low (N, = 74) and (4) Grant-White, High (Na= 71). By testing several hypotheses and by appealing to intuition when faced with the interpretation, Joreskog (1971) suggested that (1) the factors were the same in all groups, i.e. A, = A, = ... = A,,,; (2) the matrix of factor loadings was of a particularly simple form; (3) the specific variances (the Yuas) were the same in all groups. Thus it was proposed that differences in the sample variance-covariance matrices and the sample mean vectors among the groups could be explained solely by differences in the factor spaces. In this case the factor spaces were spanned by the three abilities designated space, verbal and memory. It should be noted that since the elements in the eUmatrices were free parameters, we have to fix at least one non-zero element in each column of A. Otherwise it would have been possible to multiply, say, the ith column of A by a constant, c, and then multiply the ith row and column of 0, by l/c and the ith , . Afterwards the factors element of 8, by l/c: this would not change E(xu) or E could be rescaled by postmultiplying A by a diagonal matrix, Dt, premultiplying 8, by D-*, and by pre- and postmultiplying the cPu matrices by D-* in such a way that the diagonal elements of
equal unity. This has been done for the results reported in Table 1. The parameters that were fixed during the analysis are marked by an asterisk. The estimates of the factor loadings, a matrices and specific variances are almost identical with those reported by Joreskog (1971). It should be noted that
235
in Table 3 of Joreskogs paper the numbering of the groups is given in reverse order. Thus the CP4 matrix is in fact the CP matrix for the Pasteur Low group, and not, as stated by Jijreskog in Section 3.7, the CP matrix for the Grant-White High , matrix belongs to the Pasteur High group and so on. T o further group. The 9 illustrate the differences in ability among the groups, the factor means are plotted
Table 1.-Holzinger-Swineford
Test Visual perception Cubes Paper form board General information Sentence completion Word classification Figure recognition Object-number Number-figure
Data
Specific variances 0.484 0-844 0.736 0.352 0.327 0.432 0.769 0.769 0.669
Factor loadings Space Verbal Memory 0*720* O*O* o.o* 0-425 O*O* o*o* 0.516 O.O* o*o* o.o* 0-821* O.O* o*o* 0.799 o*o* o.o* 0.781 o*o* 0.01 o*o* 0.495* o*o* o.o* 0-551 o.o* o*o* 0.580
Estimated factor variance-covariance matrices Low level High level School Pasteur
1.374 0.441 0,627 0.887 0.631 0.567 0.721 0.515 0.050
-1.110 0.254 0.920 0.525 1.189 1.037 0.203 0.923 0.349 0.917 0-575 1.023 0.542 1.002 1.331
Grant-White
Estimated factor means and standard errors (within parentheses) Space Verbal Memory Pasteur Low -0.047 (0.145) -0.636 (0.134) -0.132 (0.134) Pasteur High -0.005 (0.117) -0.215 (0.129) 0.247 (0.125) Grant-White Low 0.038 (0.129) 0.257 (0.127) -0.345 (0,114) Grant-White High 0.016 (0.140) 0.661 (0.131) 0.229 (0.151)
in Fig. 1. It is seen that the profiles are similar within schools, with those scoring high on the addition test at a higher level. Further, the space ability does not differentiate the groups. For the verbal ability there is a difference also between schools with pupils from the Grant-White school on the average scoring higher. This reflects the fact that the Pasteur school enrols children of factory workers, a large percentage of whom were foreign-born and the Grant-White school enrols children in a middle-class suburban area (Meredith, 1964). With regard to the memory factor, both groups of the Pasteur school seem to be superior to the corresponding groups of the Grant-White school, although the difference between the high groups is small. Table 1 also gives estimates of standard errors for the estimated factor means. The formulae for these have not yet been derived, but in Sorbom (19733) a
236
SORBOM
method of obtaining them for the case y = 0 is given. Thus, if we subtract the estimate of y from the sample means, approximate estimates of the standard errors for the estimated parameters in Model (1) can be obtained.
Factor means
Grant-White high
Parteur high
FIG.1.-Holzinger-Swineford
data: factor means. (The constant 0.7 has been added to the entries of Table 1.)
As a second illustration, data from the Project Talent study, earlier analysed by McGaw & Joreskog (1971), are used. These consist of 12 tests administered to 11,743 high school subjects. The subjects were divided into four groups according to high and low intelligence (10) and high and low socio-economic status (SES), thus constituting the following four groups: (1) low IQ-low SES (Nl = 4491), (2) low IQ-high SES (NB = 1336), (3) high IQ-low SES (N, = 939) and (4) high IQ-high SES (N4 = 4977).
Table 2.-Project
I General knowledge 0-659 0.812* 0.626
Talent Data
Factor loadings I1 I11 Verbal mechanics Spatial 0.275 o*o* 0-o* 0-312 0.41 2 o*o* 0.476 o*o* 0-706* o.o* 0.550 o.o* o*o* 0.604 o.o* 0.452 0-695* o*o* IV Speed of perception
Test Vocabulary Information I Information I1 Spelling Punctuation English usage Mechanical reasoning Visualization I Visualization I1 Table reading Clerical checking Object inspection
o*o*
0-o*
o*o* o.o*
o*o*
0.224
o.o*
o*o*
0.087
o.o* o*o*
o.o*
0.064
1974,n. 229-2391
237
Table 2 - c m t d Estimated specific variances Low IQ 0.453 0-068 0.382 0.656 0464 0.534 0.153 0.563 0.659 0.650 1.134 0.786 0.901 0.625 1.334 1.364 1.252
0.446
0.783 0.519 0.456 0.458 0.590 0.530 0.142 0-372 0.426
0.855
0.559 0.789 0.502 1-073 1.102 0.878
0.362
0.440
0.589 0.469 0-113 0.389 0.41 1
>
>
Low IQ
High IQ
I I1 I11 IV I I1 I11 IV
11
1.063 0.283 0.469 0.995 0.078 0.266
IV
1.481
0.455
I1
1.032 0.144 0.105 0.936 0.193 0.099
I11
1.071 0.278 1.028 0.077
IV
1.876
0.434
1 2 3 4
Estimated factor means and standard errors (within parentheses) I I1 I11 IV Speed of General Verbal Spatial perception Group knowledge mechanics LOW IQ-low SES - 1.63 (0.027) - 1.80 (0.020) - 1.20 (0.020) -0.53 (0.024) LOW IQ-high SES - 1.12 (0.038) - 1.73 (0.038) - 1.00 (0.039) -0.59 (0.043) 0.42 (0.024) High IQ-low SES 0-76 (0.033) 1.27 (0.042) 0.93 (0.043) 0.56 (0.010) High IQ-high SES 1.63 (0.015) 1.85 (0.019) 1.17 (0.023)
McGaw & Joreskog proposed a structure of the factor loadings, invariant over the groups, as indicated in Table 2, where an asterisk denotes the factor loadings held fixed during the estimation procedure. The CP and the Y matrices of the groups were free, as were the vector of factor means. The estimates reported in Table 2 are very similar to those obtained by McGaw & Joreskog, with the exception of the estimates for the factor means. The means given by McGaw & Joreskog are plotted in Fig. 3, and the estimates from the present analysis are plotted in Fig. 2. It can be seen that in the latter case the profiles for the two SES groups within IQ classes are very similar, with those in the high SES groups in general on a higher level, especially for the general knowledge factor.
238
SORBOM
An inspection of the estimated standard errors for the factor means shows that all differences, except that between low IQ-low SES and low IQ-high SES for Speed of Perception, are significant.
Factor means
I General knowledge
+
Verbal mechanics Spatial
Factor
Speed of perception
FIG. 2.-Talent
data: factor means. (The constant 2.0 has been added to the entries of Table 2.)
Factor
meant
Fwtor
Spatial
Speed of perception
FIG.3.-Talent
data: factor means obtained by McGaw & JBreskog (1971). (The constant 2.0 has been added to McGaw & JBreskogs Table 7 . )
This research has been supported by the Swedish Council for Social Science Research under project Statistical methods for analysis of longitudinal data, project director K. G. JBreskog.
REFERENCES
1974,27, 229-2391
239
J~RESKOG,K.
G. (1971). Simultaneous factor analysis in several populations. Psychometrika 36,409-426. LAWLEY, D. N. & MAXWELL, A. E. (1971). Factor Analysis as a Statistical Method, 2nd ed. London: Butterworth. MCGAW, B. & J~RESKOG, K. G. (1971). Factorial invariance of ability measures in groups differing in intelligence and socioeconomic status. BY.J. math. statist. Psychol. 24, 154-168. MEREDITH, W. (1964). Rotation to achieve factorial invariance. Psychometrib 29,187-206. PLEASE, N. W. (1973). Comparison of factor loadings in different populations. BY.J. math. statist. Psychol. 26, 61-89. S~RBOM, D.(1973~).FASPM, a computer program for factor analysis in several populations with structured means. (In preparation.) S~RBOM, D. (19733). A statistical model for the measurement of change. Res. Rep. 73-6. Department of Statistics, University of Uppsala.