Variable Selection Estimation in EViews

To perform a variable selection procedure (VARSEL) in EViews select Object/New Object/Equation, or press Estimate from the toolbar of an existing equation. From the Equation Specification dialog choose Method: VARSEL-Variable selection and Stepwise Least Squares. EViews will display the following dialog:

The Specification page allows you to provide the basic VARSEL regression specification. In the upper edit field you should first specify the dependent variable followed by the always included variables you wish to use in the final regression. Note that the VARSEL equation must be specified by list.

You should enter a list of variables to be used as the set of potentially included variables in the second edit field.

The Selection method dropdown box can be used to specify the type of selection procedure will be used to determine the final model. By default, EViews will estimate the variable selection using the Stepwise method. To change the basic method, change the Selection method dropdown menu; the dropdown allows you to choose between: Uni-directional, Stepwise, Swapwise, Combinatorial, Auto-Search/GETS and Lasso selection.

Next, you may use the Options tab to more finely tune the selection method. The options available will depend on the selection method chosen, and are described below.

Estimation Options

When you select one of the variable selection methods, the Options tab of the dialog will change to display the relevant settings.

Uni-directional and Stepwise Options

For the Uni-directional and Stepwise methods you may specify the direction of the method using the Forwards and Backwards radio buttons and to provide a Stopping Criteria using either a p-value or t-statistic tolerance for adding or removing variables.

You may also choose to stop the procedures once they have added or removed a specified number of regressors by selecting the Use number of regressors option and providing a number of the corresponding edit field.

The Weights portion of the dialog is the same as that used in other estimators such as least squares. For more information, please see
“Weighted Least Squares”.

Swapwise Options

The Swapwise variable selection method lets you choose whether you wish to use Max R-squared Increment or Min R-squared Increment, and to enter the number of additional variables to be selected.

By default, the number of additional variables is set to one so that if you do not enter a value, EViews will select the single variable that will lead to the largest increase in R-squared.

The Weights portion of the dialog is the same as that used in other estimators such as least squares. For more information, please see
“Weighted Least Squares”.

Combinatorial Options

The combinatorial options page simply prompts you to provide the number of additional variables.

By default, the number of additional variables is set to one so that if you do not enter a value, EViews will select the single variable that will lead to the largest increase in R-squared

The Weights portion of the dialog is the same as that used in other estimators such as least squares. For more information, please see
“Weighted Least Squares”.

Auto-Search / GETS Options

When Auto-Search/GETS is chosen as the Selection method on the Specification tab, the Options tab will change to display options specific to the Auto-Search/GETS algorithm:

The options are divided into several sections.

• The Model selection section offers options for the selection criteria used during the variable selection process. The Criteria dropdown specifies the information criteria used to select the final model from the candidates remaining in step 4) of the selection algorithm. The Include GUM and Include empty model checkboxes specify whether to include either the general model (i.e. the model with all possible search variables) or the empty model (the model with zero search variables) as candidates.

The Blocks edit field allows you to overwrite the EViews’ choice for the number of blocks into which to split the search variables if the number of search variables is greater than the number of observations.

• The Diagnostics section identifies the options for the diagnostic tests used in the diagnostic tests steps 1) and 3) of the algorithm (
“Auto-Search / GETS”). The Terminal condition p-value is used to set the p-value against which the significance of the remaining variable is tested when determining whether to stop the selection process along a path in step 3).

The AR LM test, ARCH LM test, Normality test and PET test checkboxes are corresponding p-value edit fields are used to determine whether those tests are used when determining the validity of both the GUM and each selection model, and their associated p-value. For both the AR and ARCH LM tests, the number of lags used in the test may also be specified.

Lasso Options

For Lasso selection, the options dialog allows you specify the data transformation options for the regressors and, potentially, the properties of the lambda path, optimization settings, and details of the cross-validation:

(All of the options available for elastic net are also available for Lasso selection with the exception of the @VW tag.)

There are several sections in the dialog.

Regularization specification

The Regularization specification section prompts you to specify the regressor transformation and settings for lambda.

• For the former, the Regressor transformation combo may be used to choose between None (the default), Standardization by sample and population, L1 and L2 normalizations, and Min-max transformation.

For multiple supplied or default generated values of lambda in the Penalty section, the available options will expand to show additional lambda options:

• In the Min/max lambda ratio text box, you may supply the ratio of the minimum to the maximum value of lambda desired; the default is 0.0001.

• In the No. lambdas on path text box, you may supply the number of values of lambda on the path; the default is 100.

Estimation algorithm

The familiar Estimation algorithm section controls the specification of starting values and iterative estimation controls.

Penalty

If the Lasso selection method is chosen, then elastic net estimation is applied with and variable weights equal to zero for the always included variables.

You should specify information about the Lasso lambda in the Lambda edit field.

You are free to fill in the lambda field with a single value, multiple values separated by spaces, or a series object from the workfile. If multiple values of lambda are entered, either directly as a list or through a series object, cross-validation will be performed to determine the value giving the model with the lowest measurement error.

You may also leave the lambda field blank. If you choose to do this EViews will generate its own list of values based on the data.

Cross-validation options

For multiple supplied or default generated values of lambda in the Penalty section, the available options will expand to include the Cross-validation options. Certain options within this section can be chosen depending on the particular method of cross-validation chosen (Folds for K-Fold cross-validation, for example).

The available Cross-validation options include:

• Shuffle: After the dataset is divided into training and test sets, the ordering is shuffled. The Random generator and Seed fields control the details of the shuffling. You may the leave the Seed field blank, in which case EViews will use the clock to obtain a seed at the time of estimation, or you may provide an integer from 0 to 2,147,483,647. The Clear button may be used to clear the seed used by a previously estimated equation. By changing the value of the Shuffle reps field from the default of 1 you may create multiple datasets with different, shuffled ordering of training and test sets.

• K-Fold: The dataset is divided into K evenly spaced “folds” and the ordering is shuffled (the details of the shuffling are determined by the Random generator and Seed fields). One fold is held out as the test set while the remaining K-1 folds are combined into the training set. This process is then repeated, with each fold being held out in turn as the test set. After model estimation the statistics are averaged over all K folds.

• Leave One Out: This is the same as K-Fold, but with the number of folds equal to the number of observations.

• Leave P Out: This is similar to K-Fold, with the exception that a test set of size P is held out, with the remaining data forming the training set. This process is repeated over all remaining combinations.

• Rolling Window: After a window size is chosen for the dataset, the window is divided into training and test sets (the default test set size is 1, and you may also choose the test set size as a fraction), with the test set in each window always coming after the training set. The window “rolls” through the dataset until it reaches the end of the dataset. You may also choose how far ahead of the training set you want the test set to be (the horizon), as well as an initial period in the dataset to hold out of the cross-validation (the initial period).

• Expanding Window: The dataset is divided into training and test sets (the default test set size is 1, and only integer test set sizes are allowed), with the test set in each window always coming after the training set. With each iteration the training set size expands by one while the test set stays the same size. You may choose how far ahead of the training set you want the test set to be (the horizon), as well as an initial period in the dataset to hold out of the cross-validation (the initial period).

Weights