Background

We offer a brief background for switching VAR models. Recall the standard -dimensional, VAR(p) process

(50.1) |

where

• is a vector of endogenous variables,

• are matrices of lag coefficients to be estimated,

• is a vector of intercepts,

• is a white noise innovation process, with , , and for .

The vector of innovations are contemporaneously correlated with full rank matrix , but are uncorrelated with their leads and lags of the innovations and assumed to be uncorrelated with all of the right-hand side variables.

Switching Specification

Following Krolzig, we modify
Equation (50.1) to allow for regime change so that follows a VAR process that depends on the value of an unobserved discrete state variable . We assume there are possible regimes, and we are said to be in regime in period when .

As in Krolzig, the VAR regime dependence is assumed to take one of two forms:

• switching intercept (SI):

(50.2) |

• switching mean (SM):

(50.3) |

Regime change in the SI model produces smooth changes in the time series, while the SM specification produces an immediate jump in mean.

Further, we will assume that the errors are distributed as for , with density function

(50.4) |

Common practice divides the parameters in the VAR specification into three groups: the intercept parameters , the endogenous variables parameters , the error variance parameters , Typically, only a subset of the groups is allowed to vary across regimes. For example, a common restriction is that only the intercepts, or only the intercepts and the error variances are regime specific.

Lastly, we may allow for exogenous variables by defining the intercepts as functions of exogenous variables and coefficients:

(50.5) |

• is a matrix of exogenous variable coefficients to be estimated,

• is a vector of exogenous variables,

so the are parameterized in terms of exogenous variable parameters . The remainder of our discussion will be in terms of , but the analysis can extend to underlying parameters.

Regime Dependence

Central to the analysis of a switching VAR model is the notion that error term depends on an unobserved state variable. The nature of this state dependence differs dramatically between the switching intercept (SI) and switching mean (SM) specifications introduced earlier.

This difference creates some notational challenges. To facilitate discussion, the remainder of our discussion will organized around a new variable that is defined in terms of current and lagged and has possible states.

We define for both specifications in the discussion below.

Switching Intercept (SI) Specification

We may use
Equation (50.2) to obtain an expression for the switching intercept error in terms of the observed data and current unobserved state:

(50.6) |

Note that the expression for depends only on the current state. Accordingly, we have and . It follows that is equivalent to the statement .

Switching Mean (SM) Specification

Similarly, we may use
Equation (50.3) to obtain an expression for the error in terms of the observed data and a set of current and past unobserved states:

(50.7) |

where is a dimensional state vector representing the current and previous regimes, with .

To simplify notation, in switching mean specifications should be interpreted as shorthand for being equal to the -th possible realization of the dimensional vector, as in

(50.8) |

where is the value of the -th lagged state in the -th possible realization, for.

Log Likelihood

The likelihood contribution for a given observation may be formed by weighting the state specific multivariate normal density
Equation (50.4) by the one-step ahead prediction of the probability of being in the given state:

(50.9) |

where is obtained from the regime specific specifications
Equation (50.6) and
Equation (50.7).

, , , are the VAR parameters, are parameters that determine the regime probabilities.

Defining , we have the full normal mixture log-likelihood

(50.10) |

which may be maximized with respect to .

It is worth noting that the likelihood function for this normal mixture model is unbounded for certain parameter values. However, local optima have the usual consistency, asymptotic normality, and efficiency properties. See Maddala (1986) for discussion of this issue as well as a survey of different algorithms and approaches for estimating the parameters.

Given parameter point-estimates, coefficient covariances may be estimated using conventional methods, e.g., inverse negative Hessian, inverse outer-product of the scores, and robust sandwich.

Regime Probabilities

To finish our likelihood specification, we must specify the regime probabilities function

There are two commonly employed forms: simple switching and Markov switching.

Simple Switching

The simple switching model features independent regime probabilities which do not depend on past states.

(50.11) |

More generally, we may allow for varying probabilities by assuming that is a function of vectors of exogenous observables and coefficients parameterized using a multinomial logit specification:

(50.12) |

for with the identifying normalization . The special case of constant probabilities is handled by choosing to be identically equal to 1.

Markov Switching

The first-order Markov assumption requires that the probability of being in a regime depends only on the previous state, so that

(50.13) |

Typically, these transition probabilities are assumed to be time-invariant so that for all , but this restriction is not required.

We may write these probabilities in a transition matrix

(50.14) |

where the -th element represents the probability of transitioning from regime in period to regime in period . (Note that some authors use the transpose of so that all of their indices are reversed from those used here.)

As in the simple switching model, we may parameterize the probabilities in terms of a multinomial logit. Note that since each row of the transition matrix specifies a full set of conditional probabilities, we define a separate multinomial specification for each row of the matrix

(50.15) |

for and with the normalizations .

Probability Prediction and Filtering

The likelihood function in
Equation (50.10) depends on the one-step ahead predicted probabilities of being in a regime: . Obtaining these predicted probabilities is central to the evaluation of the likelihood.

Of related interest are the contemporaneous estimates of the regime probabilities: . The observed value of the dependent variable provides information about which regime is in effect in a given period, and we may use this contemporaneous information to updated our estimates of the regime probabilities. The process by which the predicted probability estimates are updated to form is commonly termed filtering.

In the following sections, we outline the basics of one-step ahead prediction and filtering for both the simple switching specification and Markov switching.

Simple Switching

One-step ahead prediction is straightforward for simple switching since the one-step ahead predicted probabilities are simply the specified probability functions:

In the switching intercept case, substituting the general form of the simple switching function
Equation (50.12) into
Equation (50.10), we get

(50.16) |

By Bayes’ theorem and the laws of conditional probability, we have the filtering expressions:

Substituting, we obtain the filtering update

(50.17) |

Note that in the switching model setting, the state variable is -dimensional, so the above relationship does not apply. We must instead treat this model as a restricted form of the Markov switching model (as described below) where there no state dependence in the probability function.

Markov Switching

The Markov property of the transition probabilities implies that the expressions on the right-hand side of
Equation (50.10) must be evaluated recursively.

Briefly, each recursion step begins with filtered estimates of the regime probabilities for the previous period. Given filtered probabilities, , the recursion may broken down into three steps:

1. We first form the one-step ahead predictions of the regime probabilities using basic rules of probability and the Markov transition matrix:

(50.18) |

2. Next, we use these one-step ahead probabilities to form the one-step ahead joint densities of the data and regimes in period :

(50.19) |

3. The likelihood contribution for period is obtained by summing the joint probabilities across unobserved states to obtain the marginal distribution of the observed data

(50.20) |

4. The final step is to filter the probabilities by using the results in
Equation (50.19) to update one-step ahead predictions of the probabilities:

(50.21) |

These steps are repeated successively for each period, . All that we require for implementation are the initial filtered probabilities, , or alternately, the initial one-step ahead regime probabilities . See
“Initial Probabilities” for discussion.

The likelihood obtained by summing the terms in
Equation (50.20) yields

(50.22) |

The likelihood may be maximized with respect to the parameters using iterative methods. Coefficient covariances may be estimated using standard approaches.

Initial Probabilities

In the switching intercept form, the Markov switching filter requires initialization of the filtered regime probabilities in period 0, .

There are a few ways to proceed. Most commonly, the initial regime probabilities are set to the ergodic (steady state) values implied by the Markov transition matrix (see, for example Hamilton (1999, p. 192) or Kim and Nelson (1999, p. 70) for discussion and results). The values are thus treated as functions of the parameters that determine the transition matrix.

Alternately, we may use prior knowledge to specify regime probability values, or we can be agnostic and assign equal probabilities to regimes. Lastly, we may treat the initial probabilities as parameters to be estimated.

Note that the initialization to ergodic values using period 0 information is somewhat arbitrary in the case of time-varying transition probabilities.

In the switching means setting, the Markov switching filter requires initialization of the vector of probabilities associated with the dimensional state vector. We may proceed as in the uncorrelated model by setting initial probabilities in period as described above, and recursively applying Markov transition updates to obtain the joint initial probabilities for the dimensional initial probability vector in period .

Again note that the initialization to steady state values using the period information is somewhat arbitrary in the case of time-varying transition probabilities.

Smoothing

For the Markov switching specification, estimates of the regime probabilities may be improved by using all of the information in the sample. The smoothed estimates for the regime probabilities in period use the information set in the final period, , in contrast to the filtered estimates which employ contemporaneous information, .

Intuitively, using information about future realizations of the dependent variable () improves our estimates of being in regime in period because the Markov transition probabilities link together the likelihood of the observed data in different periods.

Kim (2004) provides an efficient smoothing algorithm that requires only a single backward recursion through the data. Under the Markov assumption, Kim shows that the joint probability is given by

(50.23) |

The key in moving from the first to the second line of
Equation (50.23) is the fact that under appropriate assumptions, if were known, there is no additional information about in the future data .

The smoothed probability in period is then obtained by marginalizing the joint probability with respect to :

(50.24) |

Note that apart from the smoothed probability terms, , all of the terms on the right-hand side of
Equation (50.23) are obtained as part of the filtering computations. Given the set of filtered probabilities, we initialize the smoother using , and iterate computation of
Equation (50.23) and
Equation (50.24) for to obtain the smoothed values.