Structural equation modeling: Back to basics

Ralph O. Mueller
a a

Department of Educational Leadership, Graduate School of Education and Human Development, The Geoige Washington University, 2134 G Street, NW, Washington, DC, 20052

To cite this article: Ralph O. Mueller (1997): Structural equation modeling: Back to basics, Structural Equation Modeling: A Multidisciplinary Journal, 4:4, 353-369

Structural Equation Modeling: Back to Basics

Ralph O. Mueller
Department of Educational Leadership Graduate School of Education and Human Development The George Washington University

Major technological advances incorporated into structural equation modeling (SEM) computer programs now make it possible for practitioners who are basically unfamiliar with the purposes and limitations of SEM to use this tool within their research contexts. The current move by program developers to market more user friendly software packages is a welcomed trend in the social and behavioral science research community. The quest to simplify the data analysis step in the research process hasat least with regard to SEMcreated a situation that allows practitioners to apply SEM but forgetting, knowingly ignoring, or most dangerously, being ignorant of some basic philosophical and statistical issues that must be addressed before sound SEM analyses should be conducted. This article focuses on some of the almost forgotten topics taken here from each step in the SEM process: model conceptualization, identification and parameter estimation, and data-model fit assessment and model modification. The main objective is to raise awareness among researchers new to SEM of a few basic but key philosophical and statistical issues. These should be addressed before launching into any one of the new generation of SEM software packages and being led astray by the seemingly irresistible temptation to prematurely start "playing" with the data. During the past 25 years, structural equation modeling (SEM) has become a powerful, mainly nonexploratory research tool for many social and behavioral scientists. The initial development during the 1970s and continuous improveRequests for reprints should be sent to Ralph O. Mueller, Department of Educational Leadership, Graduate School of Education and Human Development, The Geoige Washington University, 2134 G Street, NW, Washington, DC 20052. E-mail: rmueller@gwis2.circ.gwu.edu



menttogether with very successful marketing strategiesof major software packages such as LISREL (Jreskog & Srbom, 1995), EQS (Bentler & Wu, 1995), and, more recently, AMOS (Arbuckle, 1996), has led to an influx of SEM applications (path analyses, confirmatory factor analyses, and more general SEM utilizations) in almost every quantitatively oriented social and behavioral science research journal. Today's user of SEM is thrilled that the days finally belong to the past when model parameters could be estimated only after writing program code that presumed a fairly complete understanding of several complex (matrix) equations. For example, starting with the Microsoft Windows version of EQS 4, the BUILD_EQS option is available, allowing model specification without writing actual program code but, instead, by "clicking" on various options in pull-down menus. Similarly, beginning with the MS-Windows version of LISREL 8, the user now can use a nonmatrix-based, English-like syntax, SIMPLIS (SIMple LISrel), which eliminates the need for specifying a particular model through explicitly fixing and freeing elements in the basic model parameter matrices. Powerful graphic interfaces and other advances make it possible to not only display path diagrams on screen but also to modify these diagrams by simple "click and drag" movements from within the diagram display and to subsequently reestimate the now modified structure. On the one hand, software developers are to be complemented for taking advantage of state-of-the-art hardware and programming knowledge to simplify a mathematically and statistically complex research tool that was previously seen as being beyond the reach of many social scientists. On the other hand, the move toward more user-friendly software implicitly creates the danger of individuals using the tools without clearly understanding their purposes and limitations. Of course, the progress made in SEM software development is no more dangerous than the increased user-friendliness seen in any other statistical software package. In both cases, complex statistical analyses can be conducted with great ease without necessarily having afull understanding of their unique (dis)advantages and underlying assumptions. In my opinion, researchers can benefit greatly from conducting SEM analyses because SEM has the potential to help bridge the gap between the theoretical and empirical aspects of social science research. SEM encompasses several strategies for hypothesizing, analyzing, and interpreting relations between sets of variables that are based on and extend the more traditional concepts of correlation and regression. For example, covariances between observed and/or unobserved (latent) variables can be decomposed into structural and nonstructural components; measurement errors of observed variables can be directly incorporated into data analyses; associations between prediction errors of outcome variables can be assessed; and information can be obtained from an analysis of certain models on whether or not collected data fit a particular, a priori hypothesized structure in which certain relations are restricted to be constant (e.g., zero effects among variables). But interpretations of SEM analyses can assist in understanding aspects of social/be-



havioral phenomena only if a set of one or more alternative models is conceptualized based on a sound underlying substantive theory; representative sample data are collected and an estimation method is deliberately chosen so that the unknown population parameters can be estimated as correctly as possible; and the fit of those data to the a priori hypothesized model(s) is assessed to indicate whether or not the model should be rejected as an acceptable approximation to the true, but unknown structure. If data-model inconsistencies are identified, the availability of theoretical and statistical justifications will determine whether it might be appropriate to compare and/or modify models to rectify initial specification errors. In short, I understand SEM as a largely nonexploratory research process that is based on, and driven by, a particular substantive theory or theories. This understanding would be contrary to someone who believed that SEM is a mere statistical technique.

In the following, I seek to synthesize some basic considerations that should be addressed before the benefits of a SEM analysis can be fully realized. To the active and alert user of SEM, nothing that is written should be new or surprising. The focus here is on those individuals who, maybe for the first time, are contemplating the use of a SEM approach in addressing their research questions. In addition to the works cited throughout the article, a few of the more critical contributions to the SEM literature that I strongly recommend to the reader are Baumrind (1983), Cliff (1983), Freedman (1987), and Ling (1983).

"The study of structural equation models can be divided into two parts: the easy part and the hard part" (Duncan, 1975, p. 149). "The easy part is mathematical. The hard part is constructing causal models that are consistent with sound theory. In short, causal models are no better than the ideas that go into them" (Wolfle, 1985, p. 385). Here, Duncan and Wolfle pointed to the fundamental truth in SEM: No matter how technically sophisticated the employed statistical techniques, SEM analyses can only be beneficial to the researcher if a strong substantive theory underlies the initially hypothesized model(s). Based on correlational data, the statistical methods cannot, for example, establish or prove causal relations between variables. At most, they can help in identifying some empirical evidence to either reject or retain hypothesized causal theories and/or assess the strengths and directions of certain a priori hypothesized causal or structural relations within the context of a specific model. More specifically, SEM users should realize and remember that:

Models that were not carefully constructed from knowledge of the underlying substantive theories will only lead to empty interpretations, adding very little to our understanding of these theories. For example, I (and maybe you) have come across studies that employed both exploratory and confirmatory factor analyses using the exact same data. The authors justified "their" choice of the confirmatory model with the results from the exploratory analysis and were excited about the excellent model fit. (Obviously, here a good fit between data and model is not surprising or even noteworthy because the datanot a theorywere used to generate the model from the exploratory analysis.) Results from a SEM analysis can be interpreted validly only within the context of the analyzed model. That is, the analysis of the causal/structural relations in one model says nothing of the character of these relations in a structurally different, modified, or competing model. There are many (actually, infinitely many) alternative structures that can yield identical data-model fit results. Thus, the hunt for the model is, indeed, a fruitless one. Instead, a carefully constructed modelpreferably a set of equally theoretically plausible alternative modelsbased on the researcher's in-depth understanding of the substantive area and the constructs being modeled is needed before SEM can assist in understanding the phenomenon being investigated. One of the most controversial philosophical issues surrounding model conceptualization and interpretation is the definition and role of the concept of causality within a SEM context. Perhaps based on titles of some of the earlier treatments of SEM (e.g., Asher's, 1983, Causal Modeling; Blalock's, 1964, Causal Inferences in Nonexperimental Research; Glymour, Scheines, Spirtes, & Kelly's, 1987, D/jcovering Causal Structure; or Kenny's, 1979, Correlation and Causality), the techniques discussed here have been known to many investigators as "causal modeling," possibly fostering the erroneous idea that here is a statistical technique that can establish whether or not a causal relation exists based just on correlations among the involved variables. Today, many authors and editors (e.g., Bollen, 1989; Bollen & Long, 1993; Byrne, 1994; Hayduk, 1987; Hoyle, 1995; Marcoulides & Schumacker, 1996; Schumacker & Lomax, 1996; or Mueller, 1996) prefer the term structural equation modeling or covariance structure analysis and emphasize that establishing causality from correlations alone is not what is being attempted. In fact, some argue that "it would be very healthy if more researchers abandon thinking of and using terms such as cause and effect. Instead they should work [within the SEM framework] in terms of regression relations with predictors and outcomes" (Muthn, 1992, p. 82). However, the challenge seems to lie in the identification of the interpretative difference between causes/effects and predictors/outcomes. After all, how many among us do not at least think of the underlying concept of causality when attempting to predict a dependent variable from a set of independent variables



during a regression analysis? In my view, too few attempts are being made to clarify for the practicing structural equation modeler the possible role(s) causality can and should play in conceptualizing and interpreting SEM analyses (notable exceptions are Bullock, Harlow, & Mulaik, 1994;Mulaik, 1987,1993; Mulaik& James, 1995; and the various authors in Shaffer, 1992). Especially if we believe that causal explanation is the "ultimate" aim of science (Shaffer, 1992, p. x; Mulaik, 1993), "the [SEM] framework should help, rather than hinder, clear thinking about causal mechanisms that we think lie behind the correlations in the data" (Rothenberg, 1992, p. 99). Thus, it seems imperative that we come to an awareness of the ramifications of the various positions that have been taken on the role of causality in SEM. Such fundamental considerations as the few listed following should serve as a reminder that the various approaches to causality can greatly impact the conceptualization and interpretation of structural equation models (for more detail, you might start with Shaffer, 1992; and the various cited contributions by Mulaik. Some contributors make the epistemological assumption that nonexperimental research can generate causal inferences (e.g., Glymour et al., 1987) whereas the motto "no causation without manipulation" (Holland, 1986, p. 959) implies the inappropriateness of causal modeling in nonexperimental research settings. If time precedence is seen as a necessary condition for causality (e.g., Kenny, 1979), only recursive models (models that do not include bidirectional causal relations) seem valid. If the cause does not need to precede the effect in time (e.g., Marini & Singer, 1988), nonrecursive models can be specified (for a classic example, see Duncan, Haller, & Portes, 1968). Differences in causality definitions seem to contribute to the discussion over the appropriateness of SEM in exploratory versus confirmatory research modes. Whereas Glymour et al. (1987) proposed methods that might discover causal structures, Bollen (1989) and others saw the main value of SEM being the ability to reject a priori specified models.


A serious and sometimes ignored issue for correct parameter estimation in structural equation models is model identification. The question here is whether or not sufficient variance and covariance information from the observed variables is available to uniquely estimate the unknown coefficients. Note that this issue is not one of sample size but mostly one of the ratio of the number of variables in the model to the number of unknown parameters. Also, even if the model is theoreti-



cally identified, empirical underidentification can lead to unsolvable parameter estimation problems due to random quirks in the data (see Hayduk, 1987; Kenny, 1979). The task of establishing the identification status of a particular structural equation model is somewhat difficult because, strictly speaking, the researcher needs to investigate whether or not each parameter can be written as a function of the variances and covariances of observed variables. Fortunately, SEM software packages include algorithms that usuallybut not alwayswarn the user if a certain coefficient might not be identified. Bollen (1989) and others discussed several ways for checking identification that are beyond the scope and purpose of this article. However, the researcher applying SEM seems well advised to gain a basic understanding of the identification issue, hopefully leading to more parsimonious models that can be analyzed without serious estimation problems. In particular, the researcher should keep in mind the following: The number of parameters to be estimated in the model, p, must be less than or equal to the number of nonredundant variances and covariances of measured variables, c. That is, if p > c, the model is not identified and parameter estimation should not be attempted; on the other hand, p < c does not necessarily imply that the model is identified. For models that involve latent variables, each unobserved factor must have an assigned unit of measurement. Usually, this is accomplished by either (a) specifying the latent variable(s) to be standardized (i.e., have unit variance) or (b) specifying the unit of measurement of a latent variable to be equal to the unit of one of its observed indicator variables (termed a reference variable). In addition, an observed variable that is the only indicator of a latent variable is assumed and specified to be measured with known error (possibly zero). Recursive path analysis models (models that involve no causal loops and no latent variables) with uncorrelated error terms associated with the endogenous (dependent) variables are always identified. Identified models with p > c are overidentified. That is, parameter estimates are based on an implicit assumption that certain variance/covariance parameters are equal. If this statistically testable2 assumption is judged not to hold, parameter estimates might not be very accurate because the observed data might not fit the specified model (see data-model fit section). Once the researcher has conceptualized a model and checked its theoretical identification (and hopefully no data anomalies lead to empirical underidentification), SEM software seems to make the estimation step in the modeling process the easiest

and least worrisome of all: Even though a variety of estimation procedures are availabletheir appropriateness depending on the viability of distributional and structural assumptions about the data under investigationmost researchers employ the maximum likelihood (ML) method, probably because it is the default in most, if not all, currently available SEM software packages. Alternatives such as the generalized least squares (GLS) or asymptotically distribution free (ADF) methods are rarely considered by the nonexpert because much confusion and as-of-yet insufficient evidence exists regarding the advantages of one such method over the others. For example, not many new SEM users might be aware of (a) the asymptotic equivalence and dependence on the multivariate normality assumption of ML and GLS, (b) the large sample requirement and inconclusive evidence of the benefits of using the ADF method, or (c) the multitude of still unanswered questions regarding the behavior of the various estimation methods when analyzing data from nominal or ordinal variables.3 In my opinion, an appropriate method should be chosen deliberately, not by default. At a minimum, users should remember that: All estimation methods depend on a structural assumption. That is, strictly speaking, they all fail to provide correct sample estimates, standard errors, and data-model fit chi-square statistics (see footnotes 2 and 7) if the model under consideration is misspecified and does not reflect at least a very close approximation to the true structure in the population. Only a sound theoretical understanding of the modeled phenomenon can help minimize the chance of seriously violating this fundamental assumption underlying parameter estimation. When distribution-dependent methods are considered (e.g., ML, GLS), the analyzed data should be scrutinized with regard to the viability of the multivariate normality assumption before an analysis is conducted. Note, however, that simple and straightforward tools for that purpose are not readily available; but see the graphical procedures by Fan (1996) or Thompson (1990) and the suggestions in the most recent versions of the EQS and LISREL manuals. Sample size (n) requirements largely depend on the number of parameters to be estimated (p). As one suggested general rule of thumb, analyses probably should not be conducted with n:p ratios of less than 10:1 if parameter estimates are to be trusted (Bentler, 1993, p. 6). If the purpose is to test overall data-model consistency, a power analysis approach to the sample size question was recently articulated by MacCallum, Browne, and Sugawara (1996). The choices among estimation techniques seem to reduce to two basic alternatives: the multivariate normality dependent ML (or its asymptotic equivalents) or the largely distribution independent ADF methods.

If the structural and distributional assumptions are met (but are they ever?), ML provides asymptotically (large sample) unbiased,4 consistent,5 and efficient6 parameter estimates and standard errors. Furthermore, the ML-based large sample chi-square statistic7 is appropriate for testing an overidentifying restriction and assessing data-model consistency. As the degree of violation of the normality assumption increases, however, confidence in the validity of obtained results decreases (e.g., ML estimates become less efficient and the chi-square statistic more inflated, leading to an increase in the Type I error rate for model rejection). If the structural assumption is met and sample size is sufficiently large, ADF estimates and standard errors are asymptotically consistent, efficient, and largely independent of the observed data distributions. For the purpose of data-model fit assessment, an appropriate large sample chi-square statistic again is available. For small to moderate samples, however, conclusive evidence of the behavior of ADF estimates is still unavailable (for current reviews, see Bentler & Dudgeon, 1996; and Curran, West, & Finch, 1996).

Once parameter estimation is complete, SEM users often brace themselves with great anxiety when examining the various available fit indices.8 Sometimes, this inspection translatesat least initiallyinto either the biggest thrill ("Yes! My model fits! The hypothesized theory is confirmed!") or disappointment ("Oh no! The model does notfit!I better change it until it does !"). These reactions are fueled, respectively, by the still somewhat popular beliefs that (a) finding a well-fitting model is equivalent to discovering and/or confirming the underlying theory that is consistent with reality and (b) modifying and reestimating an initially ill-fitting model eventually will lead to the right structure. Following, I try to dispel both myths regarding fit assessment and model modification during SEM analyses.
The interpretive weight that is placed on the many alternative fit indices of recent years (see footnote 8) seems to depend to a large extent on the purpose of the SEM analysis: Assuming that a model was conceptualized and hypothesized to capture as accurately as possible some slice of realityby carefully balancing the principle of parsimony with the complexity of the social science phenomenon being studiedthe analysis' purpose can be clarified by reflecting on the relative importance of two key questions: Downloaded by [Guru Nanak Dev University] at 23:12 12 September 2012 How and to what degree do certain variables or factors affect each other? Why should certain theories be retained as plausible reflections of reality? The researcher's focus on the first query suggests & predictive purpose, whereas an emphasis on the second question points toward an explanatory aim. Now, if the purpose of a particular analysis is mainly prediction, the interpretation of overall fit indices might be secondary to the interpretation of the estimated strengths and directions of the structural paths. Here, assessment of the "fit" or match between data and model may be focused more on questions dealing with individual parameter estimates: Do coefficients and their standard errors have theory contradicting signs or magnitudes? Are any of the variance estimates negative? Are coefficients of determination (R2) negative, close to zero, or greater than unity, and so forth? If, on the other hand, the investigator seeks information on the tenability of a theory, that is, the primary purpose of the analysis is explanation, then the evaluation of overall fit indices might become the primarybut clearly not the onlyconcern (note the parallel between the previous arguments and the relative emphasis on the interpretation of regression weights and coefficients of determination in traditional regression analyses). Here, the implicit question often is whether or not the hypothesized model or theory is consistent with reality ("Is this the model?"). An unfortunate truth, however, is that empirical fit indices cannot confirm the modelreality link, only address the data-model consistency question. Figure 1 illustrates this major nonstatistical limitation9 of the available overall measures of fit. The relation between Sets A, B, and C shown in Figure 1 is not the only one possible; it is used here to clarify the following points: Evidence of data-model consistency does not necessarily imply that the chosen model represents a valid, albeit simplified, reflection of reality. Data-model inconsistency reflects a mismatch between the proposed model and reality (unless, of course, a Type 1 error was comitted). Fit assessment mainly is a disconfirmatory activity: A model or theory could be disconfirmed after concluding that observed data do not fit the hypothesized structure. But a model cannot be confirmed as being the best (or even, a good) approximation

SETB (models consistent with data) SETC (models consistent with reality)

data-model inconsistency data-model consistency \ k model-reality consistency

k model-reality inconsistency

FIGURE 1 Data-model versus model-reality consistency.

to reality after discovering that the model happens to fit the observed data, possibly by chance alone. All that can be expected from "good" fit values is some indication that the model, as specified, could be a viable representation of the true relations between the included variables. A semantic change from interpreting measures of "model fit" to interpreting indices of "data fit" might be helpful to those researchers who wonder whether "various models fit the data" or whether "the collected data fit a particular model." Although the former query can lead the investigator toward an exploratory search for any model that (by chance?) fits a particular data set, the latter question might direct the researcher more toward a disconfirmatory assessment of whether there is evidence to conclude that the collected data do not fit the a priori specified model(s). That is, to some investigators, assessing the "data fit" might convey the largely disconfirmatory idea of judging the consistency between collected data and an a priori theoretically conceptualized model better than attempting to improve the "model fit," which might be interpreted as a more exploratory approach to identifying models that happen to fit a particular data set. If any data-model inconsistencies are identified, the researcher has several options on how to proceed, depending on where on the deduction-induction continuum the main goal of the just conducted analysis is placed: (a) model rejection, (b) model comparison, or (c) model generation (adapted from Jreskog, 1993). In the first instance, the aim is a mainly deductive and disconfirmatory one: The model or theory might be rejected based on poor data-model fit and deemed an invalid



representation of reality (see Figure 1 ). Now, a new model could be conceptualized, new data collected, and the SEM process repeated from the start. If the goal of the SEM analysis is to select a "best" model from an a priori conceptualized and specified set of competing alternative models, we can utilize some of the obtained statistical informationin addition to substantive considerationsin the comparison of these structures. For example, if the model with unsatisfactory fit results was conceptualized as a nested submodel of a more general, complex structure, an easy-to-perform difference in chi-square test allows us to evaluate whether or not the latter model results in a significantly more favorable data-model fit result than the former. If the to-be-compared models are not nested, an alternative form of comparison is to evaluate the relative predictive validity of the models with some version of a regression-like cross-validation index (CVI or ECVI; see Cudeck & Browne, 1983; and Browne & Cudeck, 1989, 1993). By far the most common, but probably the least admitted of the aforementioned goals, seems to be that of model generation. The search for the best fitting, yet substantively meaningful model led several authors to suggest various model modification strategies (see, e.g., Bollen & Long, 1993; Kaplan, 1990). All of these usually more inductive and somewhat exploratory approaches are focused on using the obtained analysis results to modify, then reestimate with the same data, and thus hopefully improve the initially conceptualized model with regard to data-model consistency. Stated another way, during model modification, an aspect of a null hypothesis (the specified structural equation model) is being changed based on an in-depth "peek" at the observed data (e.g., fit indices, modification indices, expected parameter change statistics, Wald tests10) and then reevaluated to obtain the preferred results (a now better fitting model). Note that a "good" model emerging from repeated modifications could fit the data just by chance rather than being a truly better approximation to reality than the initially hypothesized model. Overall, it might be helpful to consider the following points before reaching conclusions from observed data-model inconsistencies and embarking on a model modification mission to improve the fit results (also, see Steiger, 1990, and the other responses to Kaplan, 1990): Model Rejection: A Type I error might have been committed. As mentioned previously, violations of distributional assumptions underlying parameter estimation can lead to falsely rejecting a correctly specified model that is a valid approximation of reality. In addition, the effect of a chosen sample size on the model rejection decision must be taken into account: Because desirable properties of ML and ADF estimators are asymptotic, analyses based on large samples lead to more trustworthy results than those conducted with smaller samples. On the other hand, increasing the sample size
also increases the power of the chi-square-based tests used to evaluate data-model fit and, hence, increases the chance of committing a Type I error, that is, flat-out rejecting a model even though it could serve as a "good" approximation to reality. Model Comparison: An interpretation of results from a selected model that favorably compares with other competing structures cannot be made outside the context of such alternative models. That is, favorable results are relative, not absolute. Statistical model comparisons must, of course, be interpreted in light of the usual concerns of Type I and Type II errors. Finally, note that models that are not nested cannot be directly compared with statistical means. Model Generation: Here, an attempted search for the structure that best fits a particular data set leads to a mostly exploratory view of SEM that, in my view, has the potential of doing more harm than good. Substantially changing a null hypothesis (i.e., the initially hypothesized model) based on the observed data (i.e., the values of various fit or modification indices)and then retesting the modified hypothesis with the same datais an unacceptable practice in most applications of traditional statistical thinking. Of course, a lack of resources and other circumstances might provide sufficient justification for responsible model modification and areanalysis of the data. However, a researcher should be aware of the possibility of capitalizing on chance when various not carefully conceptualized models are being fitted to the same data. When a modified model is presented as a possiblebut certainly not therepresentation of the true structure, a Type II error might have been committed. Irrespective of which of the various post hoc model modification strategies is chosen, cross-validating obtained results with a new and independent sample is very desirable because it gives some protection against the capitalization on chance and specification errors that are internal11 to the model. If the original sample is deemed large enough, it can be split into calibration and validation subsamples and the model's predictive validity could be judged by interpreting Cudeck and Browne's (1983) cross-validation index (also see Browne & Cudeck, 1989, 1993, for extensions of their work and a discussion of benefits and shortcomings of the various approaches to cross-validation in SEM).

The recent rapid developments in commercially available SEM software packages and growth in SEM applications in the social sciences12 provide new challenges and responsibilities to instructors, potential users, and reviewers of SEM applications: Whereas computer programs become more user-friendly and accessible to

novices, the increasing amount of technical reports on advances in SEM theory become more difficult to synthesize for many applied social science researchers.13 Earlier, I reviewed issues important to the satisfactory completion of each step in a typical SEM analysis: (a) careful, theory-based model conceptualization; (b) identification and parameter estimation; (c) data-model fit assessment; and, if justifiable, (d) possible model modification. For the new generation of SEM users and consumers, a clearer understanding of the purposes, advantages, limitations, and as-of-yet unanswered questions is becoming more critical than the previously needed knowledge of technical computing issues in SEM. Basically: The central purpose of SEM can be conceptualized as assisting researchers in (a) predicting how and to what degree variables and/or latent factors structurally affect each other within the context of a particular model and (b) explaining why a particular theoretical model should be rejected or could be viewed as one possible representation of the true causal mechanisms underlying the observed associations. A structural equation model is nothing more than an oversimplified approximation of reality, no matter how carefully conceptualized. A good model can be characterized as featuring an appropriate balance between efforts to represent acomplex phenomenon in the simplest possible way and to retain enough complexity that leads to the most meaningful interpretations possible. Whenever possible, a set of competing alternative models should be conceptualized, each exhibiting only those features of a perceived reality that relate directly to the analysis purposes, be they the prediction or explanation of structural relations. That is, simply fitting (sometimes grossly misspecified) models to data seems not to address adequately the "ultimate" aim of science: causal explanation (Mulaik, 1993; and Shaffer, 1992, p. x). In most straightforward applications, theoreticalbut not empiricalunderidentification can be avoided by following a set of simple rules. In addition, modern SEM software usually can aid in detecting the source of a potential underidentification. Parameter estimation should be conducted with appropriate estimation methods. All available methods depend on the (mostly unrealistic) structural assumption that the correct model was specified. Furthermore, it remains difficult to assess whether or not the data meet the multivariate normality assumption underlying some commonly used estimation methods (but see Fan, 1996; Thompson, 1990; or the recommendations in the most recent SEM software manuals). If the structural or distributional assumptions are violated, especially in analyses with small sample sizes, obtained results might not be as accurate as previously thought and, thus, the validity of inferential interpretations might be compromised (for a review of the problems associated with violated assumptions and proposed solutions, see Bentler & Dudgeon, 1996).

Renewed attempts have surfaced to reexamine the usefulness of statistical hypothesis testing in the social and behavioral sciences (e.g., Cohen, 1994; Kirk, 1996; Schmidt, 1996; Thompson, 1993, 1996). Implications and outcomes from these developments and discussions could have profound effects on the interpretation of future inferential SEM analyses. A "best" way to assess data-model fit has not yet been identified due to many unanswered questions regarding the statistical behavior of the various fit indices. Recently, a confidence interval approach to data-model fit assessment has been proposed (MacCallum et al., 1996) that allows for power estimations (and associated sample size calculations) for "not-good-fit" statistical tests. When a modified structure is reanalyzed and reevaluated using the same data set that was utilized for the initial analysis, data-model fit results usually will improve, not necessarily due to a truly "better" model (a structure that better reflects the true causal processes in the population that generated the data) but simply because a model has been fitted to a particular sample data set. Finally, more research reports utilizing a SEM analysis should include (a) sufficient theoretical justification for the proposed models; (b) a brief definition or explanation of the concept of causality, if causal links between variables are investigated; (c) a justification for the chosen estimation method and a demonstration of the viability of the underlying distributional assumptions, if applicable; (d) multiple data-model fit assessments based on different approaches or indices from different categories (see MacCallum et al., 1996; Tanaka, 1993); (e) a theoretical and statistical justification for model modifications, if applicable; and, of course, (f) sufficient descriptive statistics (e.g., an appropriate covariance matrix) so that the reader can verify the reported results.

A previous version of this article was delivered as an invited address to the Special Interest Group on Structural Equation Modeling during the meeting of the American Educational Research Association, New York, April 1996.

