Technical Discussion
While the discussion to follow is expressed in terms of a balanced system of linear equations, the analysis carries forward in a straightforward way to unbalanced systems containing nonlinear equations.
Denote a system of
equations in stacked form as:
| (43.8) |
where
is
vector,
is a
matrix, and
is a
vector of coefficients. The error terms
have an
covariance matrix
. The system may be written in compact form as:
| (43.9) |
Under the standard assumptions, the residual variance matrix from this stacked system is given by:
| (43.10) |
Other residual structures are of interest. First, the errors may be heteroskedastic across the
equations. Second, they may be heteroskedastic and contemporaneously correlated. We can characterize both of these cases by defining the
matrix of contemporaneous correlations,
, where the (
i,
j)-th element of
is given by
for all
. If the errors are contemporaneously uncorrelated, then,
for
, and we can write:
| (43.11) |
More generally, if the errors are heteroskedastic and contemporaneously correlated:
| (43.12) |
Lastly, at the most general level, there may be heteroskedasticity, contemporaneous correlation, and autocorrelation of the residuals. The general variance matrix of the residuals may be written:
| (43.13) |
where
is an autocorrelation matrix for the
i-th and
j-th equations.
Ordinary Least Squares
The OLS estimator of the estimated variance matrix of the parameters is valid under the assumption that
. The estimator for
is given by,
| (43.14) |
and the variance estimator is given by:
| (43.15) |
where
is the residual variance estimate for the stacked system.
Weighted Least Squares
The weighted least squares estimator is given by:
| (43.16) |
where
is a consistent estimator of
, and
is the residual variance estimator:
| (43.17) |
where the inner product is taken over the non-missing common elements of
and
. The max function in
Equation (43.17) is designed to handle the case of unbalanced data by down-weighting the covariance terms. Provided the missing values are asymptotically negligible, this yields a consistent estimator of the variance elements. Note also that there is no adjustment for degrees of freedom.
When specifying your estimation specification, you are given a choice of which coefficients to use in computing the
. If you choose not to iterate the weights, the OLS coefficient estimates will be used to estimate the variances. If you choose to iterate the weights, the current parameter estimates (which may be based on the previously computed weights) are used in computing the
. This latter procedure may be iterated until the weights and coefficients converge.
The estimator for the coefficient variance matrix is:
| (43.18) |
The weighted least squares estimator is efficient, and the variance estimator consistent, under the assumption that there is heteroskedasticity, but no serial or contemporaneous correlation in the residuals.
It is worth pointing out that if there are no cross-equation restrictions on the parameters of the model, weighted LS on the entire system yields estimates that are identical to those obtained by equation-by-equation LS. Consider the following simple model:
| (43.19) |
If
and
are unrestricted, the WLS estimator given in
Equation (43.18) yields:
| (43.20) |
The expression on the right is equivalent to equation-by-equation OLS. Note, however, that even without cross-equation restrictions, the standard errors are not the same in the two cases.
Seemingly Unrelated Regression (SUR)
SUR is appropriate when all the right-hand side regressors
are assumed to be exogenous, and the errors are heteroskedastic and contemporaneously correlated so that the error variance matrix is given by
. Zellner’s SUR estimator of
takes the form:
| (43.21) |
where
is a consistent estimate of
with typical element
, for all
and
.
If you include AR terms in equation
, EViews transforms the model (see
“Specifying AR Terms”) and estimates the following equation:
| (43.22) |
where
is assumed to be serially independent, but possibly correlated contemporaneously across equations. At the beginning of the first iteration, we estimate the equation by nonlinear LS and use the estimates to compute the residuals
. We then construct an estimate of
using
and perform nonlinear GLS to complete one iteration of the estimation procedure. These iterations may be repeated until the coefficients and weights converge.
Two-Stage Least Squares (TSLS) and Weighted TSLS
TSLS is a single equation estimation method that is appropriate when some of the variables in
are endogenous. Write the
j-th equation of the system as,
| (43.23) |
or, alternatively:
| (43.24) |
where
,
,
and
.
is the matrix of endogenous variables and
is the matrix of exogenous variables;
is the matrix of endogenous variables not including
.
In the first stage, we regress the right-hand side endogenous variables
on all exogenous variables
and get the fitted values:
| (43.25) |
In the second stage, we regress
on
and
to get:
| (43.26) |
where
. The residuals from an equation using these coefficients are used for form weights.
Weighted TSLS applies the weights in the second stage so that:
| (43.27) |
where the elements of the variance matrix are estimated in the usual fashion using the residuals from unweighted TSLS.
If you choose to iterate the weights,
is estimated at each step using the current values of the coefficients and residuals.
Three-Stage Least Squares (3SLS)
Since TSLS is a single equation estimator that does not take account of the covariances between residuals, it is not, in general, fully efficient. 3SLS is a system method that estimates all of the coefficients of the model, then forms weights and reestimates the model using the estimated weighting matrix. It should be viewed as the endogenous variable analogue to the SUR estimator described above.
The first two stages of 3SLS are the same as in TSLS. In the third stage, we apply feasible generalized least squares (FGLS) to the equations in the system in a manner analogous to the SUR estimator.
SUR uses the OLS residuals to obtain a consistent estimate of the cross-equation covariance matrix
. This covariance estimator is not, however, consistent if any of the right-hand side variables are endogenous. 3SLS uses the 2SLS residuals to obtain a consistent estimate of
.
In the balanced case, we may write the equation as,
| (43.28) |
where
has typical element:
| (43.29) |
If you choose to iterate the weights, the current coefficients and residuals will be used to estimate
.
Full Information Maximum Likelihood)
Following the discussion in Amemiya (1997), recall that we have
| (43.30) |
where
is a vector of endogenous variables,
is a vector of exogenous variables. The Full Information Maximum Likelihood (FIML) estimator finds the vector of parameters
by maximizing the likelihood under the assumption that
is a vector of
i.i.d. multivariate normal random variables with covariance matrix
.
Under the normality assumption, the log-likelihood is given by
| (43.31) |
where
. Note that the log determinant of the derivatives of
captures the simultaneity in the system of equations.
For the unrestricted and diagonal restricted covariance variants of the model, we may use the first-order conditions for the variance parameters and rewrite the likelihood in concentrated form:
| (43.32) |
The diagonal restricted estimator replaces the off diagonal terms in the latter matrix with zeros. The corresponding FIML estimator maximizes the concentrated likelihood with respect to the
(or equivalently, the full likelihood with respect to
and the free parameters of
).
The FIML estimator for models with user restricted covariances maximizes the full likelihood in
Equation (43.31) with respect to
given the user specified value for
.
The estimator for
is asymptotically normally distributed with coefficient covariance typically computed using the partitioned inverse of the outer-product of the gradient of the full likelihood (OPG) or the inverse of the negative of the observed Hessian of the concentrated likelihood. EViews employs the OPG covariance by default, but there is evidence that one should take seriously the choice of method (Calzolari and Panattoni, 1988). In addition, EViews offers a QML covariance computation that employs a Huber-White sandwich using both the OPG and the inverse negative Hessian.
Over the years, a number of approaches for FIML estimation have been proposed (see, for example, Parke 1982, Belsley 1980, Dagenais 1978, or Amemiya 1977). EViews offers standard BFGS, Newton-Raphson, and OPG/BHHH algorithms with various step methods in trust region form, as well as a simple implementation of BHHH with Marquardt and line search steps (
“Optimization Algorithms”). See Calzolari and Panattoni (1987) and Weihs, Calzolari, and Panattoni (1986) for simulation results for the performance of various estimators.
Whichever method you select, we encourage you to perform sensitivity analysis.
Generalized Method of Moments (GMM)
The basic idea underlying GMM is simple and intuitive. We have a set of theoretical moment conditions that the parameters of interest
should satisfy. We denote these moment conditions as:
| (43.33) |
The method of moments estimator is defined by replacing the moment condition
(43.33) by its sample analog:
| (43.34) |
However, condition
(43.34) will not be satisfied for any
when there are more restrictions
than there are parameters
. To allow for such overidentification, the GMM estimator is defined by minimizing the following criterion function:
| (43.35) |
which measures the “distance” between
and zero.
is a weighting matrix that weights each moment condition. Any symmetric positive definite matrix
will yield a consistent estimate of
. However, it can be shown that a necessary (but not sufficient) condition to obtain an (asymptotically) efficient estimate of
is to set
equal to the inverse of the covariance matrix
of the sample moments
. This follows intuitively, since we want to put less weight on the conditions that are more imprecise.
To obtain GMM estimates in EViews, you must be able to write the moment conditions in
Equation (43.33) as an orthogonality condition between the residuals of a regression equation,
, and a set of instrumental variables,
, so that:
| (43.36) |
For example, the OLS estimator is obtained as a GMM estimator with the orthogonality conditions:
| (43.37) |
For the GMM estimator to be identified, there must be at least as many instrumental variables
as there are parameters
. See the section on
“Generalized Method of Moments” for additional examples of GMM orthogonality conditions.
An important aspect of specifying a GMM problem is the choice of the weighting matrix
. EViews uses the optimal
, where
is the estimated long-run covariance matrix of the sample moments
. EViews uses the consistent TSLS estimates for the initial estimate of
in forming the estimate of
.
White’s Heteroskedasticity Consistent Covariance Matrix
If you choose the option, EViews estimates
using White’s heteroskedasticity consistent covariance matrix:
| (43.38) |
where
is the vector of residuals, and
is a
matrix such that the
moment conditions at
may be written as
.
Heteroskedasticity and Autocorrelation Consistent (HAC) Covariance Matrix
If you choose the
GMM-Time series option, EViews estimates
by,
| (43.39) |
where:
| (43.40) |
You also need to specify the
kernel function and the
bandwidth .
Kernel Options
The kernel function
is used to weight the covariances so that
is ensured to be positive semi-definite. EViews provides two choices for the kernel, Bartlett and quadratic spectral (QS). The Bartlett kernel is given by:
| (43.41) |
while the quadratic spectral (QS) kernel is given by:
| (43.42) |
where
. The QS has a faster rate of convergence than the Bartlett and is smooth and not truncated (Andrews 1991). Note that even though the QS kernel is not truncated, it still depends on the bandwidth
(which need not be an integer).
Bandwidth Selection
The bandwidth
determines how the weights given by the kernel change with the lags in the estimation of
. Newey-West fixed bandwidth is based solely on the number of observations in the sample and is given by:
| (43.43) |
where int( ) denotes the integer part of the argument.
EViews also provides two “automatic”, or data dependent bandwidth selection methods that are based on the autocorrelations in the data. Both methods select the bandwidth according to the rule:
| (43.44) |
The two methods, Andrews and Variable-Newey-West, differ in how they estimate
and
.
Andrews (1991) is a parametric method that assumes the sample moments follow an AR(1) process. We first fit an AR(1) to each sample moment
(43.36) and estimate the autocorrelation coefficients
and the residual variances
for
. Then
and
are estimated by:
| (43.45) |
Note that we weight all moments equally, including the moment corresponding to the constant.
Newey-West (1994) is a nonparametric method based on a truncated weighted sum of the estimated cross-moments
.
and
are estimated by,
| (43.46) |
where
is a vector of ones and:
| (43.47) |
for
.
One practical problem with the Newey-West method is that we have to choose a lag selection parameter
. The choice of
is arbitrary, subject to the condition that it grow at a certain rate. EViews sets the lag parameter to:
| (43.48) |
where
for the Bartlett kernel and
for the quadratic spectral kernel.
Prewhitening
You can also choose to prewhiten the sample moments
to “soak up” the correlations in
prior to GMM estimation. We first fit a VAR(1) to the sample moments:
| (43.49) |
Then the variance
of
is estimated by
where
is the long-run variance of the residuals
computed using any of the above methods. The GMM estimator is then found by minimizing the criterion function:
| (43.50) |
Note that while Andrews and Monahan (1992) adjust the VAR estimates to avoid singularity when the moments are near unit root processes, EViews does not perform this eigenvalue adjustment.
Multivariate ARCH
ARCH estimation uses maximum likelihood to jointly estimate the parameters of the mean and the variance equations.
Assuming multivariate normality, the log likelihood contributions for GARCH models are given by:
| (43.51) |
where
m is the number of mean equations, and
is the
vector of mean equation residuals. For Student's
t-distribution, the contributions are of the form:
| (43.52) |
where
is the estimated degree of freedom.
Given a specification for the mean equation and a distributional assumption, all that we require is a specification for the conditional covariance matrix. We consider, in turn, each of the three basic specifications: Diagonal VECH, Constant Conditional Correlation (CCC), and Diagonal BEKK.
Diagonal VECH
Bollerslev, et. al (1988) introduce a restricted version of the general multivariate VECH model of the conditional covariance with the following formulation:
| (43.53) |
where the coefficient matrices
,
, and
are
symmetric matrices, and the operator “•”
is the element by element (Hadamard) product. The coefficient matrices may be parametrized in several ways. The most general way is to allow the parameters in the matrices to vary without any restrictions,
i.e. parameterize them as indefinite matrices. In that case the model may be written in single equation format as:
| (43.54) |
where, for instance,
is the
i-th row and
j-th column of matrix
.
Each matrix contains
parameters. This model is the most unrestricted version of a Diagonal VECH model. At the same time, it does not ensure that the conditional covariance matrix is positive semidefinite (PSD). As summarized in Ding and Engle (2001), there are several approaches for specifying coefficient matrices that restrict
to be PSD, possibly by reducing the number of parameters. One example is:
| (43.55) |
where raw matrices
,
, and
are any matrix up to rank
. For example, one may use the rank
Cholesky factorized matrix of the coefficient matrix. This method is labeled the
Full Rank Matrix in the coefficient
Restriction selection of the system ARCH dialog. While this method contains the same number of parameters as the indefinite version, it does ensure that the conditional covariance is PSD.
A second method, which we term
Rank One, reduces the number of parameter estimated to
and guarantees that the conditional covariance is PSD. In this case, the estimated raw matrix is restricted, with all but the first column of coefficients equal to zero.
In both of these specifications, the reported raw variance coefficients are elements of
,
, and
. These coefficients must be transformed to obtain the matrix of interest:
,
, and
. These transformed coefficients are reported in the extended variance coefficient section at the end of the system estimation results.
There are two other covariance specifications that you may employ. First, the values in the
matrix may be a constant, so that:
| (43.56) |
where
is a scalar and
is an
vector of ones. This specification implies that for a particular term, the parameters of the variance and covariance equations are restricted to be the same. Alternately, the matrix coefficients may be parameterized as so that all off diagonal elements are restricted to be zero. In both of these parameterizations, the coefficients are not restricted to be positive, so that
is not guaranteed to be PSD.
Lastly, for the constant matrix
, we may also impose a on the coefficients which restricts the values of the coefficient matrix so that:
| (43.57) |
where
is the unconditional sample variance of the residuals. When using this option, the constant matrix is not estimated, reducing the number of estimated parameters.
You may specify a different type of coefficient matrix for each term. For example, if one estimates a multivariate GARCH(1,1) model with indefinite matrix coefficient for the constant while specifying the coefficients of the ARCH and GARCH term to be rank one matrices, then the number of parameters will be
, instead of
.
Constant Conditional Correlation (CCC)
Bollerslev (1990) specifies the elements of the conditional covariance matrix as follows:
| (43.58) |
Restrictions may be imposed on the constant term using variance targeting so that:
| (43.59) |
where
is the unconditional variance.
When exogenous variables are included in the variance specification, the user may choose between
individual coefficients and
common coefficients. For common coefficients, exogenous variables are assumed to have the same slope,
, for every equation. Individual coefficients allow each exogenous variable effect
to differ across equations.
| (43.60) |
Diagonal BEKK
BEKK (Engle and Kroner, 1995) is defined as:
| (43.61) |
EViews does not estimate the general form of BEKK in which
and
are unrestricted. However, a common and popular form, diagonal BEKK, may be specified that restricts
and
to be diagonals. This Diagonal BEKK model is identical to the Diagonal VECH model where the coefficient matrices are rank one matrices. For convenience, EViews provides an option to estimate the Diagonal VECH model, but display the result in Diagonal BEKK form.