User’s Guide : Advanced Multivariate Analysis : Factor Analysis : Background
  
Background
 
The Model
Number of Factors
Kaiser-Guttman, Minimum Eigenvalue
Fraction of Total Variance
Minimum Average Partial
Broken Stick
Standard Error Scree
Parallel Analysis
Bai and Ng
Ahn and Horenstein
Estimation Methods
Minimum Discrepancy (ML, GLS, ULS)
Principal Factors
Communality Estimation
Iteration
Partitioned Covariance (PACE)
Model Evaluation
Absolute Fit
Discrepancy and Chi-Square Tests
Information Criteria
Other Measures
Incremental Fit
Rotation
Types of Rotation
Standardization
Starting Values
Scoring
Score Estimation
Score Evaluation
Indeterminacy Indices
Validity, Univocality, Correlational Accuracy
We begin with a brief sketch of the basic features of the common factor model. Our notation parallels the discussion in Johnston and Wichtern (1992).
The Model
The factor model assumes that for individual , the observable multivariate -vector is generated by:
(59.1)
where is a vector of variable means, is a matrix of coefficients, is a vector of standardized unobserved variables, termed common factors, and is a vector of errors or unique factors.
The model expresses the observable variables in terms of unobservable common factors , and unobservable unique factors . Note that the number of unobservables exceeds the number of observables.
The factor loading or pattern matrix links the unobserved common factors to the observed data. The j-th row of represents the loadings of the j-th variable on the common factors. Alternately, we may view the row as the coefficients for the common factors for the j-th variable.
To proceed, we must impose additional restrictions on the model. We begin by imposing moment and covariance restrictions so that and , , , and where is a diagonal matrix of unique variances. Given these assumptions, we may derive the fundamental variance relationship of factor analysis by noting that the variance matrix of the observed variables is given by:
(59.2)
The variances of the individual variables may be decomposed into:
(59.3)
for each , where the are taken from the diagonal elements of , and is the corresponding diagonal element of . represents common portion of the variance of the j-th variable, termed the communality, while is the unique portion of the variance, also referred to as the uniqueness.
Furthermore, the factor structure matrix containing the correlations between the variables and factors may be obtained from:
(59.4)
Initially, we make the further assumption that the factors are orthogonal so that (we will relax this assumption shortly). Then:
(59.5)
Note that with orthogonal factors, the communalities are given by the diagonal elements of (the row-norms of ).
The primary task of factor analysis is to model the observed variances and covariances of the as functions of the factor loadings in , and specific variances in . Given estimates of and , we may form estimates of the fitted total variance matrix, , and the fitted common variance matrix, . If is the observed dispersion matrix, we may use these estimates to define the total variance residual matrix and the common variance residual .
Number of Factors
Choosing the number of factors is generally agreed to be one of the most important decisions one makes in factor analysis (Preacher and MacCallum, 2003; Fabrigar, et al., 1999; Jackson, 1993; Zwick and Velicer, 1986). Accordingly, there is a large and varied literature describing methods for determining the number of factors, of which the references listed here are only a small subset.
Kaiser-Guttman, Minimum Eigenvalue
The Kaiser-Guttman rule, commonly termed “eigenvalues greater than 1,” is by far the most commonly used method. In this approach, one computes the eigenvalues of the unreduced dispersion matrix, and retains as many factors as the number eigenvalues that exceed the average (for a correlation matrix, the average eigenvalue is 1, hence the commonly employed description). The criterion has been sharply criticized by many on a number of grounds (e.g., Preacher and MacCallum, 2003), but remains popular.
Fraction of Total Variance
The eigenvalues of the unreduced matrix may be used in a slightly different fashion. You may choose to retain as many factors are required for the sum of the first eigenvalues to exceed some threshold fraction of the total variance. This method is used more often in principal components analysis where researchers typically include components comprising 95% of the total variance (Jackson, 1993).
Minimum Average Partial
Velicer’s (1976) minimum average partial (MAP) method computes the average of the squared partial correlations after components have been partialed out (for ). The number of factor retained is the number that minimizes this average. The intuition here is that the average squared partial correlation is minimized where the residual matrix is closest to being the identity matrix.
Zwick and Velicer (1986) provide evidence that the MAP method outperforms a number of other methods under a variety of conditions.
Broken Stick
We may compare the relative proportions of the total variance that are accounted for by each eigenvalue to the expected proportions obtained by chance (Jackson, 1993). More precisely, the broken stick method compares the proportion of variance given by j-th largest eigenvalue of the unreduced matrix with the corresponding expected value obtained from the broken stick distribution. The number of factors retained is the number of proportions that exceed their expected values.
Standard Error Scree
The Standard Error Scree (Zoski and Jurs, 1996) is an attempt to formalize the visual comparisons of slopes used in the visual scree test. It is based on the standard errors of sets of regression lines fit to later eigenvalues; when the standard error of the regression through the later eigenvalues falls below the specified threshold, the remaining factors are assumed to be negligible.
Parallel Analysis
Parallel analysis (Horn, 1965; Humphreys and Ilgen, 1969; Humphreys and Montanelli, 1975) involves comparing eigenvalues of the (unreduced or reduced) dispersion matrix to results obtained from simulation using uncorrelated data.
The parallel analysis simulation is conducted by generating multiple random data sets of independent random variables with the same variances and number of observations as the original data. The Pearson covariance or correlation matrix of the simulated data is computed and an eigenvalue decomposition performed for each data set. The number of factors retained is then based on the number of eigenvalues that exceed their simulated counterpart. The threshold for comparison is typically chosen to be the mean values of the simulated data as in Horn (1965), or a specific quantile as recommended by Glorfeld (1995).
Bai and Ng
Bai and Ng (2002) propose a model selection approach to determining the number of factors in a principal components framework. The technique involves least squares regression using different numbers of eigenvalues obtained from a principal components decomposition. See “Bai and Ng” for details.
Ahn and Horenstein
Ahn and Horenstein (AH, 2013) provide a method for obtaining the number of factors that exploits the fact that the largest eigenvalues of a given matrix grow without bounds as the rank of the matrix increases, whereas the other eigenvalues remain bounded. The optimization strategy is then simply to find the maximum of the ratio of two adjacent eigenvalues. See “Ahn and Horenstein” for discussion.
Estimation Methods
There are several methods for extracting (estimating) the factor loadings and specific variances from an observed dispersion matrix.
EViews supports estimation using maximum likelihood (ML), generalized least squares (GLS), unweighted least squares (ULS), principal factors and iterated principal factors, and partitioned covariance matrix estimation (PACE).
Minimum Discrepancy (ML, GLS, ULS)
One class of extraction methods involves minimizing a discrepancy function with respect to the loadings and unique variances (Jöreskog, 1977). Let represent the observed dispersion matrix and let the fitted matrix be . Then the discrepancy functions for ML, GLS, and ULS are given by:
(59.6)
Each estimation method involves minimizing the appropriate discrepancy function with respect to the loadings matrix and unique variances . An iterative algorithm for this optimization is detailed in Jöreskog. The functions all achieve an absolute minimum value of 0 when , but in general this minimum will not be achieved.
The ML and GLS methods are scale invariant so that rescaling of the original data matrix or the dispersion matrix does not alter the basic results. The ML and GLS methods do require that the dispersion matrix be positive definite.
ULS does not require a positive definite dispersion matrix. The solution is equivalent to the iterated principal factor solution.
Principal Factors
The principal factor (principal axis) method is derived from the notion that the common factors should explain the common portion of the variance: the off-diagonal elements of the dispersion matrix and the communality portions of the diagonal elements. Accordingly, for some initial estimate of the unique variances , we may define the reduced dispersion matrix , and then fit this matrix using common factors (see, for example, Gorsuch, 1993).
The principal factor method fits the reduced matrix using the first eigenvalues and eigenvectors. Loading estimates, are be obtained from the eigenvectors of the reduced matrix. Given the loading estimates, we may form a common variance residual matrix, . Estimates of the uniquenesses are obtained from the diagonal elements of this residual matrix.
Communality Estimation
The construction of the reduced matrix is often described as replacing the diagonal elements of the dispersion matrix with estimates of the communalities. The estimation of these communalities has received considerable attention in the literature. Among the approaches are (Gorsuch, 1993):
Fraction of the diagonals: use a constant fraction of the original diagonal elements of . One important special case is to use ; the resulting estimates may be viewed as those from a truncated principal components solution.
Largest correlation: select the largest absolution correlation of each variable with any other variable in the matrix.
Squared multiple correlations (SMC): by far the most popular method; uses the squared multiple correlation between a variable and the other variables as an estimate of the communality. SMCs provide a conservative communality estimate since they are a lower bound to the communality in the population. The SMC based communalities are computed as , where is the i-th diagonal element of the inverse of the observed dispersion matrix. Where the inverse cannot be computed we may employ instead the generalized inverse.
Iteration
Having obtained principal factor estimates based on initial estimates of the communalities, we may repeat the principal factors extraction using the row norms of as updated estimates of the communalities. This step may be repeated for a fixed number of iterations, or until the results are stable.
While the approach is a popular one, some authors are strongly opposed to iterating principal factors to convergence (e.g., Gorsuch, 1983, p. 107–108). Performing a small number of iterations appears to be less contentious.
Partitioned Covariance (PACE)
Ihara and Kano (1986) provide a closed-form (non-iterative) estimator for the common factor model that is consistent, asymptotically normal, and scale invariant. The method requires a partitioning of the dispersion matrix into sets of variables, leading Cudeck (1991) to term this the partitioned covariance matrix estimator (PACE).
Different partitionings of the variables may lead to different estimates. Cudeck (1991) and Kano (1990) independently propose an efficient method for determining a desirable partioning.
Since the PACE estimator is non-iterative, it is especially well suited for estimation of large factor models, or for providing initial estimates for iterative estimation methods.
Model Evaluation
One important step in factor analysis is evaluation of the fit of the estimated model. Since a factor analysis model is necessarily an approximation, we would like to examine how well a specified model fits the data, taking account the number of parameters (factors) employed and the sample size.
There are two general classes of indices for model selection and evaluation in factor analytic models. The first class, which may be termed absolute fit indices, are evaluated using the results of the estimated specification. Various criteria have been used for measuring absolute fit, including the familiar chi-square test of model adequacy. There is no reference specification against which the model is compared, though there may be a comparison with the observed dispersion of the saturated model.
The second class, which may be termed relative fit indices, compare the estimated specification against results for a reference specification, typically the zero common factor (independence model).
Before describing the various indices we first define the chi-square test statistic as a function of the discrepancy function, , and note that a model with variables and factors has free parameters ( factor loadings and uniqueness elements, less implicit zero correlation restrictions on the factors). Since there are distinct elements of the dispersion matrix, there are a total of remaining degrees-of-freedom.
One useful measure of the parsimony of a factor model is the parsimony ratio: , where is the degrees of freedom for the independence model.
Note also that the measures described below are not reported for all estimation methods.
Absolute Fit
Most of the absolute fit measures are based on number of observations and conditioning variables, the estimated discrepancy function, , and the number of degrees-of-freedom.
Discrepancy and Chi-Square Tests
The discrepancy functions for ML, GLS, and ULS are given by Equation (59.6). Principal factor and iterated principal factor discrepancies are computed using the ULS function, but will generally exceed the ULS minimum value of .
Under the multivariate normal distributional assumptions and a correctly specified factor specification estimated by ML or GLS, the chi-square test statistic is distributed as an asymptotic random variable with degrees-of-freedom (e.g., Hu and Bentler, 1995). A large value of the statistic relative to the indicates that the model fits the data poorly (appreciably worse than the saturated model).
It is well known that the performance of the statistic is poor for small samples and non-normal settings. One popular adjustment for small sample size involves applying a Bartlett correction to the test statistic so that the multiplicative factor in the definition of is replaced by (Johnston and Wichern, 1992).
Note that two distinct sets of chi-square tests that are commonly performed. The first set compares the fit of the estimated model against a saturated model; the second set of tests examines the fit of the independence model. The former are sometimes termed tests of model adequacy since they evaluate whether the estimated model adequately fits the data. The latter tests are sometimes referred to as test of sphericity since they test the assumption that there are no common factors in the data.
Information Criteria
Standard information criteria (IC) such as Akaike (AIC), Schwarz (SC), Hannan-Quinn (HQ) may be adapted for use with ML and GLS factor analysis. These indices are useful measures of fit since they reward parsimony by penalizing based on the number of parameters.
Construction of the EViews factor analysis information criteria measure employ a scaled version of the discrepancy as the log-likelihood, , and begins by forming the standard IC. Following Akaike (1987), we re-center the criteria by subtracting off the value for the saturated model, and following Cudeck and Browne (1983) and EViews convention, we further scale by the number of observations to eliminate the effect of sample size. The resulting factor analysis form of the information criteria are given by:
(59.7)
You should be aware that these statistics are often quoted in unscaled form, sometimes without adjusting for the saturated model. Most often, if there are discrepancies, multiplying the EViews reported values by will line up results. Note also that the current definition uses the adjusted number of observations in the numerator of the leading term.
When using information criteria for model selection, bear in mind that the model with the smallest value is considered most desirable.
Other Measures
The root mean square residual (RMSR) is given by the square root of the mean of the unique squared total covariance residuals. The standardized root mean square residual (SRMSR) is a variance standardized version of this RMSR that scales the residuals using the diagonals of the original dispersion matrix, then computes the RMSR of the scaled residuals (Hu and Bentler, 1999).
There are a number of other measures of absolute fit. We refer you to Hu and Bentler (1995, 1999) and Browne and Cudeck (1993), McDonald and Marsh (1990), Marsh, Balla and McDonald (1988) for details on these measures and recommendations on their use. Note that where there are small differences in the various descriptions of the measures due to degree-of-freedom corrections, we have used the formulae provided by Hu and Bentler (1999).
Incremental Fit
Incremental fit indices measure the improvement in fit of the model over a more restricted specification. Typically, the restricted specification is chosen to be the zero factor or independence model.
EViews reports up to five relative fit measures: the generalized Tucker-Lewis Nonnormed Fit Index (NNFI), Bentler and Bonnet’s Normed Fit Index (NFI), Bollen’s Relative Fit Index (RFI), Bollen’s Incremental Fit Index (IFI), and Bentler’s Comparative Fit Index (CFI). See Hu and Bentler (1995)for details.
Traditionally, the rule of thumb was for acceptable models to have fit indices that exceed 0.90, but recent evidence suggests that this cutoff criterion may be inadequate. Hu and Bentler (1999) provide some guidelines for evaluating values of the indices; for ML estimation, they recommend use of two indices, with cutoff values close to 0.95 for the NNFI, RFI, IFI, CFI.
Rotation
The estimated loadings and factors are not unique; we may obtain others that fit the observed covariance structure identically. This observation lies behind the notion of factor rotation, in which we apply transformation matrices to the original factors and loadings in the hope of obtaining a simpler factor structure.
To elaborate, we begin with the orthogonal factor model from above:
(59.8)
where . Suppose that we pre-multiply our factors by a rotation matrix where . Then we may re-write the factor model Equation (59.1) as:
(59.9)
which is an observationally equivalent common factor model with rotated loadings and factors , where the correlation of the rotated factors is given by:
(59.10)
See Browne (2001) and Bernaards and Jennrich (2005) for details.
Types of Rotation
There are two basic types of rotation that involve different restrictions on . In orthogonal rotation, we impose constraints on the transformation matrix so that , implying that the rotated factors are orthogonal. In oblique rotation, we impose only constraints on , requiring the diagonal elements of equal 1.
There are a large number of rotation methods. The majority of methods involving minimizing an objective function that measure the complexity of the rotated factor matrix with respect to the choice of , subject to any constraints on the factor correlation. Jennrich (2001, 2002) describes algorithms for performing orthogonal and oblique rotations by minimizing complexity objective.
For example, suppose we form the matrix where every element equals the square of a corresponding factor loading : . Intuitively, one or more measures of simplicity of the rotated factor pattern can be expressed as a function of these squared loadings. One such function defines the Crawford-Ferguson family of complexities:
(59.11)
for weighting parameter . The Crawford-Ferguson (CF) family is notable since it encompasses a large number of popular rotation methods (including Varimax, Quartimax, Equamax, Parsimax, and Factor Parsimony).
The first summation term in parentheses, which is based on the outer-product of the i-th row of the squared loadings, provides a measure of complexity. Those rows which have few non-zero elements will have low complexity compared to rows with many non-zero elements. Thus, the first term in the function is a measure of the row (variables) complexity of the loadings matrix. Similarly, the second summation term in parentheses is a measure of the complexity of the j-th column of the squared loadings matrix. The second term provides a measure of the column (factor) complexity of the loadings matrix. It follows that higher values for assign greater weight to factor complexity and less weight to variable complexity.
Along with the CF family, EViews supports the following rotation methods:
 
Method
Orthogonal
Oblique
Biquartimax
Crawford-Ferguson
Entropy
 
Entropy Ratio
 
Equamax
Factor Parsimony
Generalized Crawford-Ferguson
Geomin
Harris-Kaiser (case II)
 
Infomax
Oblimax
 
Oblimin
 
Orthomax
Parsimax
Pattern Simplicity
Promax
 
Quartimax/Quartimin
Simplimax
Tandem I
 
Tandem II
 
Target
Varimax
EViews employs the Crawford-Ferguson variants of the Biquartimax, Equamax, Factor Parsimony, Orthomax, Parsimax, Quartimax, and Varimax objective functions. For example, The EViews Orthomax objective for parameter is evaluated using the Crawford-Ferguson objective with factor complexity weight .
These forms of the objective functions yield the same results as the standard versions in the orthogonal case, but are better behaved (e.g., do not permit factor collapse) under direct oblique rotation (see Browne 2001, p. 118-119). Note that oblique Crawford-Ferguson Quartimax is equivalent to Quartimin.
The two orthoblique methods, the Promax and Harris-Kaiser both perform an initial orthogonal rotation, followed by a oblique adjustment. For both of these methods, EViews provides some flexibility in the choice of initial rotation. By default, EViews will perform an initial Orthomax rotation with the default parameter set to 1 (Varimax). To perform initial rotation with Quartimax, you should set the Orthomax parameter to 0. See Gorsuch (1993) and Harris-Kaiser (1964) for details.
Some rotation methods require specification of one or more parameters. A brief description and the default value(s) used by EViews is provided below:
 
Method
Parameter Description
Crawford-Ferguson
1
Factor complexity weight (default=0, Quartimax).
Generalized Crawford-Ferguson
4
Vector of weights for (in order): total squares, variable complexity, factor complexity, diagonal quartics (no default).
Geomin
1
Epsilon offset (default=0.01).
Harris-Kaiser (case II)
2
Power parameter (default=0, independent cluster solution).
Oblimin
1
Deviation from orthogonality (default=0, Quartimin).
Orthomax
1
Factor complexity weight (default=1, Varimax).
Promax
1
Power parameter (default=3).
Simplimax
1
Fraction of near-zero loadings (default=0.75).
Target
1
matrix of target loadings. Missing values correspond to unrestricted elements. (No default.)
Standardization
Weighting the rows of the initial loading matrix prior to rotation can sometimes improve the rotated solution (Browne, 2001). Kaiser standardization weights the rows by the inverse square roots of the communalities. Cureton-Mulaik standardization assigns weights between zero and one to the rows of the loading matrix using a more complicated function of the original matrix.
Both standardization methods may lead to instability in cases with small communalities.
Starting Values
Starting values for the rotation objective minimization procedures are typically taken to be the identity matrix (the unrotated loadings). The presence of local minima is a distinct possibility and it may be prudent to consider random rotations as alternate starting values. Random orthogonal rotations may be used as starting values for orthogonal rotation; random orthogonal or oblique rotations may be used to initialize the oblique rotation objective minimization.
Scoring
The factors used to explain the covariance structure of the observed data are unobserved, but may be estimated from the loadings and observable data. These factor score estimates may be used in subsequent diagnostic analysis, or as substitutes for the higher-dimensional observed data.
Score Estimation
We may compute factor score estimates as a linear combination of observed data:
(59.12)
where is a matrix of factor score coefficients derived from the estimates of the factor model. Often, we will construct estimates using the original data so that but this is not required; we may for example use coefficients obtained from one set of data to score individuals in a second set of data.
Various methods for estimating the score coefficients have been proposed. The first class of factor scoring methods computes exact or refined estimates of the coefficient weights . Generally speaking, these methods optimize some property of the estimated scores with respect to the choice of . For example, Thurstone’s regression approach maximizes the correlation of the scores with the true factors (Gorsuch, 1983). Other methods minimize a function of the estimated errors with respect to , subject to constraints on the estimated factor scores. For example, Anderson and Rubin (1956) and McDonald (1981) compute weighted least squares estimators of the factor scores, subject to the condition that the implied correlation structure of the scores , equals .
The second set of methods computes coarse coefficient weights in which the elements of are restricted to be (-1, 0, 1) values. These simplified weights are determined by recoding elements of the factor loadings matrix or an exact coefficient weight matrix on the basis of their magnitudes. Values of the matrices that are greater than some threshold (in absolute value) are assigned sign-corresponding values of -1 or 1; all other values are recoded at 0 (Grice, 2001).
Score Evaluation
There are an infinite number of factor score estimates that are consistent with an estimated factor model. This lack of identification, termed factor indeterminacy, has received considerable attention in the literature (see for example, Mulaik (1996); Steiger (1979)), and is a primary reason for the multiplicity of estimation methods, and for the development of procedures for evaluating the quality of a given set of scores (Gorsuch, 1983, p. 272).
See Gorsuch (1993) and Grice(2001) for additional discussion of the following measures.
Indeterminacy Indices
There are two distinct types of indeterminacy indices. The first set measures the multiple correlation between each factor and the observed variables, and its square . The squared multiple correlations are obtained from the diagonals of the matrix where is the observed dispersion matrix and is the factor structure matrix. Both of these indices range from 0 to 1, with high values being desirable.
The second type of indeterminacy index reports the minimum correlation between alternate estimates of the factor scores, . The minimum correlation measure ranges from -1 to 1. High positive values are desirable since they indicate that differing sets of factor scores will yield similar results.
Grice (2001) suggests that values for that do not exceed 0.707 by a significant degree are problematic since values below this threshold imply that we may generate two sets of factor scores that are orthogonal or negatively correlated (Green, 1976).
Validity, Univocality, Correlational Accuracy
Following Gorsuch (1983), we may define as the population factor correlation matrix, as the factor score correlation matrix, and as the correlation matrix of the known factors with the score estimates. In general, we would like these matrices to be similar.
The diagonal elements of are termed validity coefficients. These coefficients range from -1 to 1, with high positive values being desired. Differences between the validities and the multiple correlations are evidence that the computed factor scores have determinacies lower than those computed using the -values. Gorsuch (1983) recommends obtaining validity values of at least 0.80, and notes that values larger than 0.90 may be necessary if we wish to use the score estimates as substitutes for the factors.
The off-diagonal elements of allow us to measure univocality, or the degree to which the estimated factor scores have correlations with those of other factors. Off-diagonal values of that differ from those in are evidence of univocality bias.
Lastly, we obviously would like the estimated factor scores to match the correlations among the factors themselves. We may assess the correlational accuracy of the scores estimates by comparing the values of the with the values of .
From our earlier discussion, we know that the population correlation . may be obtained from moments of the estimated scores. Computation of is more complicated, but follows the steps outlined in Gorsuch (1983).