Você está na página 1de 5

General FAQ #33: Handling non-normal data in

structural equation modeling (SEM)


Question:
I am having trouble getting my hypothesized structural equation model to fit my data.
Someone told me that non-normal data are a problem for SEM models; this person
suggested using the generalized least-squares (GLS) estimator to fit my model instead of
the default maximum likelihood (ML) estimator. What is the best way to handle nonnormal data when fitting a structural equation model?

Answer:
The hypothesis tests conducted in the structural equation modeling (SEM) context fall
into two broad classes: tests of overall model fit and tests of significance of individual
parameter estimate values. Both types of tests assume that the fitted structural equation
model is true and that the data used to test the model arise from a joint multivariate
normal distribution (JMVN) in the population from which you drew your sample data. If
your sample data are not JMVN distributed, the chi-square test statistic of overall model
fit will be inflated and the standard errors used to test the significance of individual
parameter estimates will be deflated. Practically, this means that if you have non-normal
data, you are more likely to reject models that may not be false and decide that
particular parameter estimates are statistically significantly different from zero when in
fact this is not the case (type 1 error). Note that this type of assumption violation is also
a problem for confirmatory factor analysis models, latent growth models (LGMs), path
analyses, or any other type of model that is fit using structural equation modeling
programs such as LISREL, EQS, AMOS, and PROC CALIS in SAS.
How can you correct for non-normal data in SEM programs? There are three general
approaches used to handle non-normal data:
1. Use a different estimator (e.g., GLS) to compute goodness of fit tests, parameter
estimates, and standard errors
2. Adjust or scale the obtained chi-square test statistic and standard errors to take
into account the non-normality of the sample data
3. Make use of the bootstrap to compute a new critical chi-square value, parameter
estimates, and standard errors

Estimators
Most SEM software packages offer the data analyst the opportunity to use generalized
least-squares (GLS) instead of the default maximum likelihood (ML) to compute the
overall model fit chi-square test, parameter estimates, and standard errors. Under joint
multivariate normality, when the fitted model is not false GLS and ML return identical chisquare model fit values, parameter estimates, and standard errors (Bollen, 1989). Recent
research by Ulf H. Olsson and his colleagues, however, (e.g., Olsson, Troye, & Howell,
1999) suggests that GLS underperforms relative to ML in the following key areas:
1. GLS accepts incorrect models more often than ML

2. GLS returns inaccurate parameter estimates more often than ML


A consequence of (2) is that modification indices are less reliable when the GLS estimator
is used. Thus, we do not recommend the use of the GLS estimator.
A second option is to use Browne's (1982) Asymptotic Distribution Free (ADF) estimator,
available in LISREL. Unfortunately, the use of ADF requires sample sizes that exceed at
least 1000 cases and small models due to the computational requirements of the
estimation procedure. As Muthn (1993) concludes, "Apparently the asymptotic
properties of ADF are not realized for the type of models and finite sample sizes often
used in practice. The method is also computationally heavy with many variables. This
means that while ADF analysis may be theoretically optimal, it is not a practical method"
(p. 227).
For these reasons, the standard recommendation is to use the ML estimator (or one of
the variants described below) when fitting a model to data that are drawn from a
population with variables that are assumed to be normally and contiuously distributed in
the population from which you drew your sample. By contrast, if your variables are
inherently categorical in nature, consider using a software package designed specifically
for this type of data. Mplus is one such product. It is uses a variant of the ADF method
mentioned previously, weighted-least squares (WLS). WLS as implemented by Mplus for
categorical outcomes does not require the same sample sizes as does ADF for
continuous, non-normal data. Further discussion of the WLS estimator is beyond the
scope of this FAQ; interested readers are encouraged to peruse Muthn, du Toit, and
Spisic (1997) and Muthn (1993) for further details.

Robust scaled and adjusted Chi-square tests and parameter


estimate standard errors
A variant of the ML estimation estimation approach is to correct the model fit chi-square
test statistic and standard errors of individual parameter estimates. This approach was
introduced by Satorra and Bentler (1988) and incorporated into the EQS program as the
ml,robust option. The ml,robust option in EQS 5.x provides the Satorra-Bentler scaled
chi-square statistic, also known as the scaled T statistic that tests overall model fit.
Current, West, and Finch (1996) found that the scaled chi-square statistic outperformed
the standard ML estimator under non-normal data conditions. Mplus also offers the
scaled chi-square test and accompanying robust standard errors via the estimator option
mlm. Mplus offers also offers a similar test statistic called the Mean and Variance
adjusted chi-square statistic via the estimator option mlmv.
An adjusted version of the scaled chi-square statistic is presented in Bentler and
Dudgeon (1996). Fouladi (1998) conducted an extensive simulation study that found that
this adjusted chi-square test statistic outperformed both the standard ML chi-square and
the original scaled chi-square test statistic, particularly in smaller samples. Unfortunately,
the adjusted test statistic is not available in EQS 5.x.
The robust approaches work by adjusting, usually downward, the obtained model fit chisquare statistic based on the amount of non-normality in the sample data. The larger the
multivariate kurtosis of the input data, the stronger the applied adjustment to the chi-

square test statistic. Standard errors for parameter estimates are adjusted upwards in
much the same manner to reduce appropriately the type 1 error rate for individual
parameter estimate tests. Although the parameter estimate values themselves are the
same as those from a standard ML solution, the standard errors are adjusted (typically
upward), with the end result being a more appropriate hypothesis test that the
parameter estimate is zero in the population from which the sample was drawn.

Bootstrapping
The robust scaling approach described above adjusts the obtained chi-square model fit
statistic based on the amount of multivariate kurtosis in the sample data. An alternative
method to deal with non-normal input data is to not adjust the obtained chi-square test
statistic and instead adjust the critical value of the chi-square test. Under the assumption
of JMVN and if the fitted model is not false, the expected value of the chi-square test of
model fit is equal to the model's degrees of freedom (DF). For example, if you fit a model
that was known to be true and the input data were JMVN and the model had 20 DF, you
would expect the chi-square test of model fit to be 20, on average. On the other hand,
non-normality in the sample data can inflate the obtained chi-square to a value that
exceeds DF, say 30. The robust scaled and adjusted chi-square tests mentioned in the
previous section work by lowering the value of the obtained chi-square to correct for nonnormality. For instance, in this example a reasonable value for the robust scaled or
adjusted chi-square might be 25 instead of 30. Ideally, the adjusted chi-square would be
closer to 20, but the adjustments are not perfect.
Bootstrapping works by computing a new critical value of the chi-square test of overall
model fit by computing a new critical chi-square value. In our example, instead of the
JMVN expected chi-square value of 20, a critical value generated via the bootstrap might
be 27. The original obtained chi-square statistic for the fitted model (e.g., 30) is then
compared the bootstrap critical value (e.g., 27) rather than the original model DF value
(e.g., 20). A p-value based upon the comparison of the obtained chi-square value to the
bootstrap-generated critical chi-square value is then computed.
How is the bootstrap critical chi-square value generated? First, the input data is assumed
to be the total population of responses and the bootstrap program draws samples, with
replacement, of size N from this pseudo-population repeatedly. For each drawn sample,
the input data are transformed to assume that your fitted model is true. This step is
necessary because the critical chi-square value is computed from a central chi-square
distribution; a central chi-square distribution assumes the null hypothesis is not false.
The same assumption is made when you use the standard ML chi-square to test model
fit: the obtained chi-square is equal to the model DF when the null hypothesis is not
rejected.
Next, the model is fit to the data and the obtained chi-square is output and saved. This
process is repeated across each of the bootstrap samples. At the conclusion of the
bootstrap sampling, the bootstrap program collects the chi-square model fit statistics
from each sample and computes their mean value. This mean value becomes the critical
value for the chi-square test from the original analysis.

The procedure detailed above is credited to Bollen and Stine (1993) and is implemented
in AMOS. AMOS allows the data analyst to specify the number of bootstrap samples
drawn (typically 250 to 2000 bootstrap samples) and it outputs the distribution of the
chi-square values from the bootstrap samples as well as the mean chi-square value and a
Bollen-Stine p-value based upon a comparison of the original model's obtained chi-square
with the mean chi-square from the bootstrap samples.
AMOS also computes individual parameter estimates, standard errors, confidence
intervals, and p-values for tests of significances of individual parameter estimates based
upon various types of bootstrap methods such as bias-correction and percentilecorrection. Mooney and Duval (1993) and Davison and Hinkley (1997) describe these
methods and their properties whereas Efron and Tibshirani (1993) provide an
introduction to the bootstrap. Fouladi (1998) found in a simulation study that the BollenStine test of overall model fit performed well relative to other methods of testing model
fit, particularly in small samples.

Cautions and notes


One of the corollary benefits of the bootstrap is the ability to obtain standard errors and
therefore p-values for quantities for which normal theory standard errors are not defined,
such as r-square statistics. A primary disadvantage of the bootstrap and the robust
methods mentioned previously is that they require complete data (i.e., no missing data
are allowed). Use of the bootstrap method requires the data analyst to set the scale of
latent variables by fixing a latent variable's value to 1.00 rather than by fixing the
corresponding factor's variance value to 1.00 because under the latter scenario
bootstrapped standard error estimates may be artificially inflated by switching positive
and negative factor loadings across bootstrap samples (Hancock & Nevitt, 1999).

References
For more information about non-normal data handling in SEM, see the following
references:
Benter, P. M., & Dudgeon, P. (1996). Covariance structure analysis:
Statistical practice, theory, and directions. Annual Review of Psychology,
47, 563-592.
Bollen, K. A. (1989). Structural equations with latent variables. New York,
NY: John Wiley and Sons.
Bollen, K. A., & Stine, R. A. (1993). Bootstrapping goodness-of-fit
measures in structural equation models. In K. A. Bollen and J. S. Long
(Eds.) Testing structural equation models. Newbury Park, CA: Sage
Publications.
Browne, M. W. (1984). Asymptotically distribution-free methods for the
analysis of covariance structures. British Journal of Mathematical and
Statistical Psychology, 37, 62-83.

Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test
statistics to nonnormality and specification error in confirmatory factor
analysis. Psychological Methods, 1, 16-29.
Davison, A. C., & Hinkley, D. V. (1997). Bootstrapping methods and their
application. Cambridge, UK: Cambridge University Press.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New
York, NY: Chapman and Hall Publishers.
Fouladi, R. T. (1998). Covariance structure analysis techniques under
conditions of multivariate normality and nonnormality - Modified and
bootstrap test statistics. Paper presented at the American Educational
Research Association Annual Meeting, April 11-17, 1998, San Diego, CA.
Hancock, G. R., & Nevitt, J. (1999). Bootstrapping and the identification of
exogenous latent variables within structural equation models. Structural
Equation Modeling, 6(4), 394-399.
Mooney, C. Z., & Duval, R. D. (1993). Bootstrapping: A nonparametric
approach to statistical inference. Newbury Park, CA: Sage Publications.
Muthn, B. O. (1993). Goodness of fit with categorical and other
nonnormal variables. In K. A. Bollen and J. S. Long (Eds.) Testing
structural equation models. Newbury Park, CA: Sage Publications.
Muthn, B. O., du Toit, S. H. C., & Spisic, D. (In press). Robust inference
using weighted-least squares and quadratic estimating equations in latent
variable
modeling
with
categorical
and
continuous
outcomes.
Psychometrika.
Olsson, U. H., Troye, S. V., & Howell, R. D. (1999). Theoretic fit and
empirical fit: The performance of maximum likelihood versus generalized
least squares estimation in structural equation models. Multivariate
Behavioral Research, 34(1), 31-59.
Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in
covariance structure analysis. 1988 Proceedings of the Business and Economics
Statistics Section of the American Statistical

Você também pode gostar