Você está na página 1de 15

Factor Analysis

(NB)
Content I. II. Introductory characterisation of Factor Analysis (FA) Stages and assumptions of FA 1. Deciding whether your data is amenable to FA i. Correlation matrix ii. Sample size iii. SPSS tests of factorability 2. Initial statistics and unrotated factor matrix 3. Deciding the number of factors i. Kaisers criterion ii. Cattells Scree plot technique iii. By conceptual analysis 4. Factor rotation 5. Final statistics and rotated factor matrix III.
Reference: The outline of Factor Analysis and the utilised SPSS practice example follow closely Pallant, J. (2007) SPSS Survival Manual: A Step by Step Guide to Data Analysis using SPSS for Windows (3rd Ed.). Maidenhead: Open University Press

Practice example

I. Introductory characterisation of Factor Analysis (FA)


Factor analysis (FA) is a form of multivariate analysis and represents a data reduction technique. The core objective of FA is to simplify multivariate data by reducing it to a smaller number of underlying dimensions or components (factors). Since a large

number of items can be reduced to a smaller set of dimensions or components FA is widely applied in questionnaire/test development (exploratory FA) and subsequent validation (confirmatory FA).

FA is based on correlation with the initial correlation matrix determining the position of questionnaire/test items in relation to each other. This item positioning can be

geometrically represented on a number of axes. The items load on and along an axis, with positive items at one end and negative or reverse items at the other. The axes which pivot
Factor Analysis Page 1

at a common point can be correlated (oblique) or uncorrelated (orthogonal). In order to obtain a more clear-cut and better interpretable solution the axes of factors are rotated. Interpretation of factors (i.e. ascribing a label to a factor) is derived from looking at the items that load onto a factor and determining whether there is a common theme in the battery of items which form a potential factor (dimension or construct). Hence, the

labelling of factors (which represent unobservable, i.e. latent variables) is a subjective judgement of the researcher.

Distinction between Factor Analysis (FA) and Principal Components Analysis (PCA)

The term Factor Analysis (FA) is an umbrella or generic term for a collection of data reduction techniques with the consequence that different variants of the technique can produce varying results. Factor analysis (FA) is similar to Principal Components

Analysis (PCA) in that both techniques seek simplification of large amounts of multivariate data by reducing the number of variables required for their description. PCA involves a mathematical transformation of the observed variables. The idea of PCA is to replace the original n variables by a smaller set of m (m<n) uncorrelated variables (factors), each of which is a linear combination of the original variables. By means of such item/variable reduction to a smaller set of factors the bulk of variance can be accounted for by a small number of explanatory factors. FA is a more complex and sophisticated approach than PCA in that it proposes a particular model to explain structure and correlations between the observed (manifest) variables. The proposed model involves derivation of a small number of more fundamental underlying constructs or dimension. Since these constructs or

dimensions cannot be observed and measured directly, they are called latent variables or factors. Both FA and PCA solutions can be subjected to the process of rotation which aims at making the solution more interpretable conceptually. Rotated solutions yield what Cattell referred to as simple structure and the solutions may be orthogonal or oblique (i.e. the factors are assumed to be uncorrelated or correlated, respectively). PCA and FA can furthermore be distinguished in terms of the variance they take into account. Both PCA and FA account for the shared variance; this variance which is

shared by all items/variables is also referred to as common variance and requires distinction from specific variance or unique variance which is not shared with any other item/variable. The random error is taking into account by the error variance.
Factor Analysis Page 2

The total variance in the scores of an item/variable to assess a particular factor can therefore be partitioned into common, specific and error variance.

Total Variance = Common Variance + Specific Variance + Error Variance

PCA analysis the total variance of a score or item/variable including its unique variance which implies the assumption that the questionnaire/test used to assess the latent variables (factors) is without measurement error and perfectly reliable. In contrast to PCA, FA only accounts for and analysis the common (shared) variance and thus excludes unique variance from the analysis. As PCA examines the total variance of a test or questionnaire, this is fixed at 1.0 (reflecting perfect reliability and absence of measurement error), whereas for FA values range from 0.0 to 1.0. PCA is more commonly used as an exploratory technique (i.e. at the stage of questionnaire or test development), whereas FA is used as confirmatory technique (i.e. at the stage of questionnaire or test validation).

II. Stages and assumptions of FA


The term FA will subsequently be used in its broader sense thus including PCA.

1. Deciding whether your data is amenable to FA

i) Correlation matrix The initial step is the computation of the correlation matrix comprising the inter-correlations between all items/variables. Only if there are significant correlations between items, i.e. if items are related, one would expect them to form one or more factors. In order to continue with the conduction of FA the researcher has to ensure that there is evidence of correlation coefficients greater than 0.3 (Tabachnick & Fidell, 2007).

ii) Sample size The reliability of factors emerging from a factor analysis depends on the size of the sample. As a rule of thumb Tabachnick & Fidell (2007) suggest five cases per

questionnaire/test item as a minimum.


Factor Analysis Page 3

iii) SPSS tests of factorability The aforementioned crude inspection by means of correlation coefficients and sufficiently large sample size will be followed by two SPSS measures for assessing factorability of the data: Bartletts test of sphericity and the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy (0-1) with the following cut-off values representing factorability: KMO > 0.7 Bartletts test of sphericity should be significant (p<0.05).

2. Initial statistics and unrotated factor matrix

Factor extraction involves determining the most parsimonious set of factors, i.e. the smallest number of factors that can be used to best represent the inter-relations among the set of variables/items. There are a plethora of approaches and techniques that can be used to identify or extract the number of underlying factors or dimensions using SPSS (with PCA and FA as the most commonly used approaches): Principal components analysis (PCA): Takes total variance into account (shared +
unique + error variance)

Principal factors analysis (FA): Takes only shared variance into account (i.e. excludes
unique and error variance)

Image factoring Maximum likelihood factoring Alpha factoring Unweighted least squares Generalised least squares

The initial statistics provided by SPSS are Communality, Eigenvalue, Variance (accounted for by a factor, and cumulative), Item loadings (Extent to which individual items correlate with a factor).

Factor Analysis

Page 4

Communality Communality refers to the proportion of variance of a test that has been accounted for by the extracted factors; for PCA this is set at 1.0 whereas for FA this varies between 0.0 and 1.0. The first component (factor/axis) being extracted accounts for the largest amount of variance shared by the items/variables. The second factor accounts for the second largest amount of variance, which is not explained by the first factor, i.e. it is assumed that these two factors are unrelated or orthogonal to each other. The outlined iterative process is continued until all the variance is accounted for. The initial statistics comprise as many factors as variables, although the degree of variance that is explained by subsequent factors successively decreases.

Eigenvalue The amount of variance a factor accounts for is its Eigenvalue and the initial solution will account for the Eigenvalues of each factor (which at this stage is represented by a single item or variable). The factors (here still as single items or variables) are arranged in order of decreasing Eigenvalues. The sum of the Eigenvalues of all original variables (i.e. factors in the initial solution) equates the number of factors. The SPSS output Total Variance Explained also reports the (better interpretable) % of Variance and Cumulative %. The loadings of all variables on all factors are shown in the unrotated factor matrix.

3. Deciding the number of factors

i) Kaisers Criterion The Kaiser criterion refers to the commonly applied rule to select only those factors which have an Eigenvalue > 1. Factors which have an Eigenvalue < 1 reflect that the factor only accounts for a very small amount of the variance, in fact less than can be accounted for by a single variable. In light of the core aim of FA to account for large number of variables by fewer factors (components) which reflect the underlying structure factor Eigenvalues < 1 are not substantial and frequently just occur due to error. Most FA programmes only accept factors with Eigenvalues > 1. Yet, in the following exceptional cases this is not an appropriate selection criterion for a potential factor:
Factor Analysis Page 5

The vast majority of factors have Eigenvalues > 1 (reflecting pronounced noise in the system) - or contrary The vast majority of factors have Eigenvalues < 1.

In these exceptional cases the Cattells scree technique offers an alternative criterion for deciding the number of factors to be included in the FA.

ii) Cattells scree technique:

The Screeplot is a graphical presentation of the Eigenvalues in descending order. In other words the graph plots and represents the descending variance accounted by each factor. The plot typically shows a break between the steep slope of the initial factors and the gentle slope of subsequent factors. The factors to be retained are those that lie above the point (factor) at which the Eigenvalues seem to level off. In the figure below Factor 3 can be identified as the knee, the point at which Eigenvalues appear to level off; therefore suggesting a two-factor solution.

Example : Factor Scree Plot


4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 1 2 3 4 5 6 7 Factor Number

Above and beyond SPSS more sophisticated methods and programmes can be used to test the optimal number of factors (e.g. Parallel Analysis using Vista-Paran).

Factor Analysis

Eigenvalue

Page 6

iii) By Conceptual Analysis:

Another guide to determine the number of factors for the optimal solution comprises the
conceptual analysis of the meanings of the factors extracted. In other words the

number of expected factors should be based upon a sound theoretical framework of the structural model under investigation.

Further, less commonly used techniques include breaking the sample into sub-groups, and examining the degree of similarity/distinction between separate FAs. it is evident that this technique requires a very large initial sample.

4. Factor rotation

Factor rotation makes the different factor patterns more distinct and the solution more amenable to meaningful interpretation which counts for both explanatory and

confirmatory solutions.

Albeit factor rotation does not change the underlying solution

(factor structure), it results in more pronounced patterns of factor loadings.

VARIMAX rotation (a type of orthogonal rotation): The derived factors are

assumed to be independent of each other, and so are drawn at right angles to each other on the graph. The orthogonal VARIMAX method aims to identify a small number of independent but powerful factors; therefore, it aims at and results in the identification of parsimonious factor structures. Where data does not fit an orthogonal solution, DIRECT OBLIMIN rotation (a type of oblique rotation) can be used; in this case it is assumed that that the two factors are not entirely independent but conceptually related to some extent. As opposed to orthogonal methods, oblique methods aim to identify a larger number of less powerful, but intercorrelated, factors.

5. The final statistics and rotated factor matrix

The final statistics are subject to reportage and comprise:


Factor Analysis Page 7

Communalities (quantify the amount of variance in the variables that has been

accounted for by the extracted factors) The number of factors extracted The Eigenvalue for each factor extracted, i.e. the amount of variance a single extracted factor accounts for The cumulative variance the total factor solution accounts for.

After rotation the interpretability of factors will usually have improved. The most important final SPSS output is the Rotated Factor (Component) Matrix which simplifies the previous (Unrotated) Factor Matrix in that it minimises the number of factors on which variables have high loadings. Whereas in the Unrotated Factor Matrix variables potentially load on a number of factors (cross-loadings), in the Rotated Factor Matrix for orthogonal rotation each variable will (more clearly) load on one distinct factor.

Factor Analysis

Page 8

Practice Example Factor Analysis


Reference: The SPSS practice example can be found in: Pallant, J. (2007) SPSS Survival Manual: A Step by Step Guide to Data Analysis using SPSS for Windows (3rd Ed.). Maidenhead: Open University Press

The SPSS file FactorAnalysis.sav contains twenty items measuring positive and negative mood states (PA = Positive Affect, and NA = Negative Affect) labelled pn1 to pn20. They have been measured by means of the PANAS scale/questionnaire (Watson, Clark & Tellegen, 1988). Ten adjectives refer to PA (e.g. active, determined and proud), whereas the other ten adjectives refer to NA (e.g. nervous, upset and irritable). The PANAS

questionnaire utilises a five point Likert scale from not at all (1) to extremely (5). The authors of the PANAS suggest a two-factor structure in light of the two psychological dimensions assessed: PA and NA. To explore whether the community sample scored in line with the expected two-component structure the task is to subject the scale items to PCA (Principal Component Analysis as exploratory technique). The PCA will be run in two steps: Step 1: Checking Factorability and Initial Solution and Step 2: Rotated Solution.

Step 1: Factorability and initial solution

Firstly, it has to be checked if the sample size is sufficient to run FA (are there at least 5 times more participants than questionnaire items?) if yes, then... Click on Analyze
Data Reduction Factor

In the Factor Analysis box enter all the questionnaire items (i.e. pn1 to pn20). Still in the Factor Analysis click on Descriptives. To see whether there is sufficient correlation between items and to test factorability:
Significance Levels Correlation Matrix

Reproduced and click KMO and Bartletts test of Sphericity. Statistics Initial Solution (this will calculate initial

Still in Descriptives:

Communalities, Eigenvalues and Percentage of variance explained)

Factor Analysis

Page 9

Extraction

Method

Principal Components Analysis [default] (as analysis is

exploratory rather than confirmatory).


Analyse Extract Correlation Matrix [default] Eigenvalues over one [default] (Kaiser Criterion) Screeplot Continue

Display

In Options

Missing Values and click Exclude cases pairwise Coefficient Display Format and Click Sorted by size and in

Sill in Options

Suppress values less than __ type in .3 (with the effect that only loadings above 0.3

will be displayed) Final


OK to run the CFA

Relevant outputs (before rotation)

KMO and Bartlett's Test


Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett's Test of Sphericity Approx. Chi-Square Df Sig. .874 3966.539 190 .000

The assumptions/criteria for factorability are clearly met since KMO > .70 (.874) and Barletts Test significant (p<.05, here even p<.001). Hence, Factor Analysis can be conducted with the data.

Subsequently one has to make a decision about the number of factors (components) that shall be used; the SPSS outputs presented on pages 11 and 12 provide guidance: Using the Kaiser Criterion (Components with Eigenvalues > 1) would suggest a fourfactor solution (see Total Variance Explained Table). However, the Screeplot supports the underlying theoretic two-factor solution. This is also supported by the Component Matrix showing that most items load strongly on the first two factors, and only very few items load on factors three and four.
Factor Analysis Page 10

Total Variance Explained


Compo nent 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Total 6.250 3.396 1.223 1.158 .898 .785 .731 .655 .650 .601 .586 .499 .491 .393 .375 .331 .299 .283 .223 .175 Initial Eigenvalues % of Variance 31.249 16.979 6.113 5.788 4.490 3.926 3.655 3.275 3.248 3.004 2.928 2.495 2.456 1.964 1.875 1.653 1.496 1.414 1.117 .874 Cumulative % 31.249 48.228 54.341 60.130 64.619 68.546 72.201 75.476 78.724 81.728 84.656 87.151 89.607 91.571 93.446 95.100 96.595 98.010 99.126 100.000 Extraction Sums of Squared Loadings Total 6.250 3.396 1.223 1.158 % of Variance 31.249 16.979 6.113 5.788 Cumulative % 31.249 48.228 54.341 60.130

Extraction Method: Principal Component Analysis.

The first four factors all have Eigenvalues > 1 (6.250 3.396, 1.223, 1.158). These four components explain a total of 60.13% in the variance.

The corresponding Screeplot suggesting a two-factor (rather than a four-factor)


solution is presented on page 12.

Factor Analysis

Page 11

The Screeplot levels off at factor/component 3 with the resulting decision rule to retain two factors.

The component matrix (before rotation!) on page 13 shows the loadings of each of the items on the four factors (degree to which they correlate with the factors/components).
Most items load strongly on the first two factors/components, whereas only very few

items load on components three and four. Hence, the Component Matrix supports the
two-factor solution which is in line with the Scree plot inspection.

We should bear in mind that the Component Matrix only shows items with loadings > .3 (because we clicked the option to suppress items with loadings < .3).

Factor Analysis

Page 12

Component Matrix
Component 1 enthusiastic alert active distressed strong attentive interested upset scared afraid inspired nervous irritable jittery proud excited determined hostile ashamed guilty .678 .639 .625 -.615 .608 .606 .599 -.591 -.584 -.582 .580 -.569 -.552 -.544 .473 .476 .429 -.415 -.429 -.473 .358 .484 .437 .427 .562 .649 .566 .419 .324 .413 .381 .408 .449 .456 .499 .544 .368 .459 .462 -.459 -.452 2 .475 .404 3 4

Extraction Method: Principal Component Analysis. a. 4 components extracted.

To achieve a more interpretable solution the second step requires conducting the Factor Rotation which results in the rotated solution.

Factor Analysis

Page 13

Step 2: Rotated solution

The Rotated Solution will produce the Rotated Component Matrix.


SPSS retains the previously entered variables (items) and options. To achieve the rotated solution some modifications have to made, e.g.:

Removing the tick in the Initial Solution Box (in Descriptives) Removing the tick in the KMO and Bartletts box Removing the ticks from Screeplot and Unrotated factor solution Select Number of Factors (=2)! Specifying the Rotation Method (here Varimax)

Alternative: To reset the FactorAnalysis.sav close and reopen the SPSS file and run the rotated solution (only default settings).

Rotated Component Matrix


Component 1 enthusiastic inspired alert attentive interested excited strong active determined proud nervous afraid scared distressed jittery upset irritable hostile guilty ashamed .818 .764 .741 .723 .696 .679 .663 .620 .612 .540 .787 .732 .728 .728 .708 .704 .647 .595 .585 .492 2

Factor Analysis

Page 14

Total Variance Explained


Compo nent 1 2 Rotation Sums of Squared Loadings Total 4.881 4.757 % of Variance 24.405 23.786 Cumulative % 24.405 48.191

Component 1 (Positive Affect) contributes to 24.41% and Component 2 (Negative Affect) to 23.79% of the variance, i.e. the two-component model account for 48.20% in the variance of Affect/Mood.

Reportage of the FA:

The 20 items comprising PANAS instrument was subjected to Principal Components Analysis (PCA) using SPSS/PASW Version 19. Prior to conducting the PCA, factorability of the data was assessed. Inspection of the correlation matrix revealed many (significant) correlations > .3. The KMO value of .87 exceeded the recommended minimum of 0.7 and Bartletts test of spehericity was clearly significant. Hence, in all data was suitable for PCA. Four components with Eigenvalues > 1 emerged from the subsequent CFA, explaining 31.2%, 17.0%, 6.1% and 5.8% of the variance, respectively. Inspection of the Scree plot revealed a clear break after the second component using Cattells (1966) scree test this suggested retaining two components (factors). The two-component solution explained in total 48.2% of the variance, with Component 1 accounting for 31.2% and Component 2 for 17.0% of the variance. The subsequently conducted oblimin rotation supported the presence of a simple structure (Thurstone, 1947) with both components comprising a number of items strong and distinctive loadings. In all, the interpretation of the components was in line with previous research of the PANAS instrument with positive affect items clearly loading on Component 1 and negative affect items strongly loading on Component 2. There was a weak to moderate intercorrelation between both factors (r = -.28). In their entirety, data supported the hypothesis that positive and negative affect items form two distinct subscales.

Factor Analysis

Page 15

Você também pode gostar