Basic Link Concepts
A link is a series-like object that exists in one workfile page, but “refers” to series data in another workfile page. At a basic level, a link is a description of how EViews should use data in a source workfile page to determine values of a series in the current, or destination, workfile page.
A link contains three fundamental components:
• First, there is the name of a source series. The source series identifies the series in the source workfile page that is used as a basis for obtaining values in the destination page.
• Second, the link contains the names of one or more link identifier (ID) series in both the source and destination pages. The source ID and destination ID series will be used to match observations from the two pages.
• Lastly, the link contains a description of how the source series should be used to construct link values for matching observations in the destination page.
The basic series link employs a method called
match merging to determine the link values in the destination page. More advanced links combine match merging with automatic frequency conversion. We describe these two methods in detail below, in
“Linking by general match merging” and
“Linking by date with frequency conversion”.
As the name suggests, the series link object shares most of the properties of a series. You may, in fact, generally use a series link as though it were a series. You may examine series views, perform series procedures, or use the series link to generate new data, or you may use the link as a regressor in an equation specification.
Another important property of links is that they are “live”, in the sense that the values in the link change as its underlying data change. Thus, if you have a link in a given workfile page, the link values will automatically be updated when the source series or ID series values change.
Lastly, links are memory efficient. Since links are computed and updated as needed, the values of the series link are not held in memory unless they are in use. Thus, it is possible to create a page populated entirely by links that takes up only the minimum amount of memory required to perform all necessary operations.
Linking by general match merging
We begin our discussion of linking with a brief, and admittedly terse, description of how a basic link with match merging works. More useful, perhaps, will be the extended examples that follow.
The basic link first compares values for one or more source ID series with the values in the destination ID series. Observations in the two pages are said to match if they have identical ID values. When matches are observed, values from the source series are used to construct values of the link for the corresponding observations in the destination page.
Each link contains a description of how the source series should be used to construct link values in the destination page. Constructing values for a basic match merge link involves two steps:
• First, we perform a contraction of the source series to ensure that there is a single value associated with each distinct source ID value. The contraction method employed describes how the (possibly) multiple source series observations sharing a given ID value should be translated into a single value.
• Next, we take the distinct source IDs and contracted source series values, and perform a match merge in which each contracted value is repeated for all matching observations in the destination page.
This basic method is designed to handle the most general cases involving many-to-many match merging by first computing a many-to-one contraction (by-group summary) of the source series, and then performing a one-to-many match merge of the contracted data.
All other match merges are handled as special cases of this general method. For a many-to-one match merge, we first compute the contraction, then perform one-to-one matching of the contracted data into the destination page. In the more common one-to-many or one-to-one match merge, the contraction step typically has no practical effect since the standard contractions simply return the original source series values. The original values are then linked into the destination page using a simple one-to-one or one-to-many match merge.
While all of this may seem a bit abstract, a few simple examples should help to fix ideas. Suppose first that we have a state workfile page containing four observations on the series STATE1 and TAXRATE:
| |
Arkansas | .030 |
California | .050 |
Texas | .035 |
Wyoming | .012 |
In the same workfile, we have a second workfile page containing individual level data, with a name, NAME, state of residence, STATE2, and SALES volume for six individuals:
| | |
George | Arkansas | 300 |
Fred | California | 500 |
Karen | Arkansas | 220 |
Mark | Texas | 170 |
Paula | Texas | 120 |
Rebecca | California | 450 |
We wish to link the data between the two pages. Note that in this example, we have given the state series different names in the two pages to distinguish between the two. In practice there is no reason for the names to differ, and in most cases, the names will be the same.
One-to-many match merge
Our first task will be to create, in the page containing individual information, a series containing values of the TAXRATE faced by every individual. We will determine the individual rates by examining each individual’s state of residence and locating the corresponding tax rate. George, for example, who lives in Arkansas, will face that state’s tax rate of 0.030. Similarly, Mark, who lives in Texas, has a tax rate of 0.035.
We will use a series link to perform a one-to-many match merge in which we assign the TAXRATE values in our source page to multiple individuals in our destination page.
For the three basic components of this link, we define:
• the source series TAXRATE
• the source identifier STATE1 and destination identifier STATE2
• the merge rule that the values of TAXRATE will be repeated for every individual with a matching STATE2 value in the destination page
This latter merge rule is always used for basic links involving one-to-many match merges. Here, the rule leads to the natural result that each individual is assigned the TAXRATE value associated with his or her state.
After performing the link, the individual page will contain the merged values for the tax rate in TAXRATE2. We use the “2” in the TAXRATE2 name to denote the fact that these data are generated by merging data using STATE2 as the destination ID series:
| | | |
George | Arkansas | 300 | .030 |
Fred | California | 500 | .050 |
Karen | Arkansas | 220 | .030 |
Mark | Texas | 170 | .035 |
Paula | Texas | 120 | .035 |
Rebecca | California | 450 | .050 |
We mention one other issue in passing that will become relevant in later discussion. Recall that all basic links with match merging first contract the source series prior to performing the match merge. In this case, the specified merge rule implicitly defines a contraction of the source series TAXRATE that has no effect since it returns the original values of TAXRATE. It is possible, though generally not desirable, to define a contraction rule which will yield alternate source values in a one-to-many match merge. See
“Link calculation settings”.
Many-to-one match merge
Alternatively, we may wish to link data in the opposite direction. We may, for example, choose to link the SALES data from the individual page to the destination state page, again matching observations using the two state IDs. This operation is a many-to-one match merge, since there are many observations with STATE2 ID values in the individual page for each of the unique values of STATE1 in the state page.
The components of this new link are easily defined:
• the source series SALES
• the source identifier STATE2 and destination identifier STATE1
• a merge rule stating that the values of SALES will first be contracted, and that the contracted values will be placed in matching observations in the destination page
Specifying the last component, the merge rule, is a bit more involved here since there are an unlimited number of ways that we may contract the individual data. EViews provides an extensive menu of contraction methods. Obvious choices include computing the mean, variance, sum, minimum, maximum, or number of observations for each source ID value. It is worth noting here that only a subset of the contraction methods are available if the source is an alpha series.
To continue with our example, suppose that we choose to take the sum of observations as our contraction method. Then contraction involves computing the sum of the individual observations in each state; the summary value for SALES in Arkansas is 520, the value in California is 950, and the value in Texas is 290. Wyoming is not represented in the individual data, so the corresponding contracted value is NA.
Given this link definition, the many-to-one match merge will result in a state page containing the match merged summed values for SALES1:
| | | |
Arkansas | .030 | 520 | 2 |
California | .050 | 950 | 2 |
Texas | .035 | 290 | 2 |
Wyoming | .012 | NA | 0 |
Similarly, we may define a second link to the SALES data containing an alternative contraction method, say the count of non-missing observations in each state. The resulting link, SALES1CT, shows that there are two individual observations for each of the first three states, and none for Wyoming.
Many-to-many match merge
Lastly, suppose that we have a third workfile page containing a panel structure with state data observed over a two year period:
| | |
1990 | Arkansas | .030 |
1991 | Arkansas | .032 |
1990 | California | .050 |
1991 | California | .055 |
1990 | Texas | .035 |
1991 | Texas | .040 |
1990 | Wyoming | .012 |
1991 | Wyoming | .035 |
Linking the SALES data from the individual page to the panel page using the STATE2 and STATE3 identifiers involves a many-to-many match merge since there are multiple observations for each state in both pages.
The components of this new link are easily defined:
• the source series SALES
• the source identifier STATE2 and destination identifier STATE3
• a merge rule stating that the values of SALES will first be contracted, and that the contracted values will be repeated for every observation with a matching STATE3 value in the destination page
This merge rule states that we perform a many-to-many merge by first contracting the source series, and then performing a one-to-many match merge of the contracted results into the destination. For example, linking the SALES data from the individual page into the panel state-year page using the sum and count contraction methods yields the link series SALES3 and SALES3A:
| | | | |
1990 | Arkansas | .030 | 520 | 2 |
1991 | Arkansas | .032 | 520 | 2 |
1990 | California | .050 | 950 | 2 |
1991 | California | .055 | 950 | 2 |
1990 | Texas | .035 | 290 | 2 |
1991 | Texas | .040 | 290 | 2 |
1990 | Wyoming | .012 | NA | 0 |
1991 | Wyoming | .035 | NA | 0 |
It is worth noting that this many-to-many match merge is equivalent to first performing a many-to-one link from the individual page into the state page, and then constructing a one-to-many link of those linked values into the panel page. This two-step method may be achieved by first performing the many-to-one link into the state page, and then performing a one-to-many link of the SALES1 and SALES1CT links into the panel page.
Linking by date match merging
To this point, we have primarily considered simple examples involving a single categorical link identifier series (states). You may, of course, construct more elaborate IDs using more than one series. For example, if you have data on multinational firms observed over time, both the firm and date identifiers may be used as the link ID series.
The latter example is of note since it points to the fact that dates may be used as valid link identifiers. The use of dates as identifiers requires special discussion, as the notion of a match may be extended to take account of the calendar.
We begin our discussion of merging using dates by noting that a date may be employed as an identifier in two distinct ways:
• First, an ID series containing date values or alphanumeric representations of dates may be treated like any other ID series. In this case, the value in one workfile page must be identical to the value in the other page for a match to exist.
• Alternatively, when we are working with regular frequency data, we may take advantage of our knowledge of the frequency and the calender to define a broader notion of date matching. This broader form of matching, which we term date matching, involves comparing dates by first rounding the date ID values down to the lowest common regular frequency and then comparing the rounded values. Note that date matching requires the presence of at least one regular frequency for the rounding procedure to be well-defined.
In practical terms, date matching produces the outcomes that one would naturally expect. With date matching, for example, the quarterly observation “2002Q1” matches “2002” in a regular annual workfile, since we round the quarterly observation down to the annual frequency, and then match the rounded values. Likewise, we would match the date “March 3, 2001” to the year 2001 in an annual workfile, and to “2001Q1” in a quarterly workfile. Similarly, the date “July 10, 2001” also matches 2001 in the annual workfile, but matches “2001Q3” in the quarterly workfile.
Basic links with date matching
Consider the following simple example of linking using date matching. Suppose that we have a workfile containing two pages. The first page is a regular frequency quarterly page containing profit data (PROFIT) for 2002 and 2003:
| |
2002Q1 | 120 |
2002Q2 | 130 |
2002Q3 | 150 |
2002Q4 | 105 |
2003Q1 | 100 |
2003Q2 | 125 |
2003Q3 | 200 |
2003Q4 | 170 |
while the second page contains irregular data on special advertising events (ADVERT):
| |
Jan 7, 2002 | 10 |
Mar 10, 2002 | 50 |
Apr 9, 2002 | 40 |
May 12, 2002 | 90 |
Mar 1, 2003 | 70 |
Dec 7, 2003 | 30 |
Dec 23, 2003 | 20 |
We would like to link the quarterly profit data to the irregular data in the advertising page. The quarterly values in the source page are unique so that we have a one-to-one or one-to-many match merge; accordingly, we may select any contraction method that leaves the original PROFIT data unchanged (mean, unique, etc.).
We first employ date matching by using the “@DATE” and “@DATE” keywords as our ID series. This specification instructs EViews to use the knowledge about the date structures in the page to perform a sophisticated matching across pages. Using this approach, we construct a PROFIT1 link containing the values:
| | |
Jan 7, 2002 | 10 | 120 |
Mar 10, 2002 | 50 | 120 |
Apr 9, 2002 | 40 | 130 |
May 12, 2002 | 90 | 130 |
Mar 1, 2003 | 70 | 100 |
Dec 7, 2003 | 30 | 170 |
Dec 23, 2003 | 20 | 170 |
In evaluating the values in PROFIT1, we simply repeat the value of PROFIT for a given quarter for every matching observation in the advertising page. Since we are using date matching, we employ matching across pages that uses calendar knowledge to determine matches. For example, the observation for quarter “2002Q1” matches both “Jan 7, 2002” and “Mar 10, 2002” in the advertising page so that the latter observations are assigned the value of 120.
Conversely, using date matching to link the ADVERT series to the quarterly page, we have a many-to-one match merge since, after rounding down to the lower frequency, multiple observations in the advertising page have “@DATE” values that match the unique “@DATE” values in the quarterly page. If we choose to employ the mean contraction method in the link ADVERT1, we have:
| | |
2002Q1 | 120 | 30 |
2002Q2 | 130 | 65 |
2002Q3 | 150 | NA |
2002Q4 | 105 | NA |
2003Q1 | 100 | 70 |
2003Q2 | 125 | NA |
2003Q3 | 200 | NA |
2003Q4 | 170 | 25 |
Here, the values of ADVERT1 contain the mean values over the observed days in the quarter. For example, the value for ADVERT1 in 2002Q1 is taken by averaging the values of ADVERT for “Jan 7, 2002” and “Mar 10, 2002”. Note that the value for quarter 2002Q3 is NA since there are no observations with matching DATE values, i.e., there are no observations in the advertising page that fall within the quarter.
It is worth noting that in both of these examples, had we employed exact matching using by specifying our ID series as QUARTER and DATE, we would have observed no matches since the date numbers for the quarterly data do not match any of the irregular date numbers. As a result, all of the values in the resulting link would be assigned the value NA.
Panel links with date matching
When using date matching to link dated panel data to a page with a different frequency, you should pay particular attention to the behavior of the merge operation since the results may differ from expectations.
An example will illustrate the issue. Consider the following simple panel featuring quarterly revenue data from 2002Q1 to 2003Q4:
| | |
1 | 2002Q1 | 120 |
1 | 2002Q2 | 130 |
1 | 2002Q3 | 150 |
1 | 2002Q4 | 105 |
1 | 2003Q1 | 100 |
1 | 2003Q2 | 125 |
1 | 2003Q3 | 200 |
1 | 2003Q4 | 170 |
2 | 2002Q1 | 40 |
2 | 2002Q2 | 40 |
2 | 2002Q3 | 50 |
2 | 2002Q4 | 35 |
2 | 2003Q1 | 20 |
2 | 2003Q2 | 25 |
2 | 2003Q3 | 50 |
2 | 2003Q4 | 40 |
We will consider the results from linking the REVENUE data into an annual page using date matching of the QUARTER and the YEAR identifiers. Using date match merging (with the “@DATE” and “@DATE” keywords), and employing both the sum and number of observations contractions, we observe the results in REVENUE1 (sum) and REVENUE1A (obs):
The important thing to note here is that the sums for each year have been computed over all eight matching observations in the panel page.
The key to understanding the result is to bear in mind that date matching only changes the way that a match between observations in the two pages is defined; the remaining match merge operation remains unchanged. The outcome is simply the result of applying standard link behavior in which we first identify matches, compute a contraction over all matching observations, and perform the one-to-one match merge.
An alternative approach to obtaining annual revenue values from the panel data would be to first contract the panel data to a quarterly frequency by averaging across firms, and then to convert the quarterly data to an annual frequency by summing over quarters. This approach, which produces very different results from the first method, may be undertaken in two steps: by first linking the quarterly panel data into a quarterly page (using the mean contraction), and then frequency converting by linking the quarterly data to the annual frequency (summing over quarters).
In cases where you are linking between two panel pages at different frequencies there are yet more complication. See
“Panel frequency conversion” for a description of the issues involved in constructing these types of links.
Linking by date with frequency conversion
In the special case where we wish to link data between two regular frequency pages using dates as the sole identifier, EViews allows you to define your links in two ways. First, you may use the date match merging described in
“Linking by date match merging”, or you can define special links that employ frequency conversion.
Basic frequency conversion
Links specified by date will primarily be used to perform automatic frequency conversion of simple regular frequency data. For example, you may choose to hold your quarterly frequency data in one page, your monthly frequency data in a second page, and to create links between pages which automatically perform the up or down frequency conversion as necessary.
You can instruct EViews to use the source series default methods for converting between frequencies, or you may use the link definition to specify the up and down conversion methods. Furthermore, the live nature of links means that changes in the source data will generate automatic updates of the frequency converted link values.
We divide our discussion of frequency conversion links into those that link data from high to low frequency pages and those that link from low to high frequency pages.
High to low frequency conversion
Frequency conversion linking from a simple regular high frequency page to a regular low frequency page is fundamentally the same as using a link with date matching to perform basic many-to-one match merging. In both cases, we match dates, compute a contraction of the source series, and then perform a one-to-one match merge.
Given the specialized nature of frequency conversion, links specified by date with frequency conversion offer a subset of the ordinary link contraction methods. All of the standard high to low frequency conversion methods (average, sum, first, last, maximum and minimum) are supported, but the match merge methods which do not preserve levels, (such as the sum-of-squares or the variance) are not included.
Frequency conversion links also allow you to disable conversions for partially observed periods, so that a missing value for the source series in a given month generates a missing value for the corresponding quarterly observation. This option is not available for basic match merge links.
Low to high- frequency conversion
In contrast, linking from low to high frequency pages using frequency conversion differs substantively from linking using basic date match merging.
When linking using general date match merging, the frequency conversion implied by the one-to-many match merge may only be performed by repeating the low frequency observation for every matching high frequency observation. Thus, in a one-to-many date match merge, an annual observation is always repeated for each matching quarter, month, or day.
In contrast, EViews provides additional up-conversion methods for frequency conversion links. In addition to the simple repeated-observation (constant-match average) method, frequency conversion links support all of the standard frequency conversion methods including constant-match sum, quadratic-match sum, quadratic-match average, linear-match sum, linear-match last, and cubic-match last.
Suppose that, in addition to our regular frequency quarterly PROFIT workfile page, we have a regular frequency monthly page containing observations spanning the period from August 2002 to March 2003. Linking the PROFIT data from the quarterly page into the monthly page by date, with frequency conversion, requires that we specify an up-conversion method. Here, we show results of a frequency conversion link using both the simple constant-match average (PROFIT2) and quadratic-match average (PROFIT3) methods:
| | |
Aug 2002 | 150 | 152.407 |
Sep 2002 | 150 | 144.630 |
Oct 2002 | 105 | 114.074 |
Nov 2002 | 105 | 103.519 |
Dec 2002 | 105 | 97.407 |
Jan 2003 | 100 | 97.222 |
Feb 2003 | 100 | 98.889 |
Mar 2003 | 100 | 103.889 |
Note that the PROFIT2 values are the same as those obtained by linking using simple date match merging, since the constant-match average method simply repeats the PROFIT observations for each matching month. Conversely, the PROFIT3 values are obtained using an interpolation method that is only available for linking by date with frequency conversion.
Panel frequency conversion
There are additional issues to consider when performing frequency conversion links in panel workfile settings:
• When working with two regular frequency panel pages, each defined by a single ID, frequency conversion links construct values by performing the frequency conversion separately for each value of the panel ID.
• If the source page is a regular frequency panel and the destination is an ordinary regular frequency page, we contract the source series by computing means across the panel identifiers to form a single time series (note that “mean” is the only contraction allowed). The resulting time series, which is in the source frequency, is then frequency converted to the destination frequency. Finally, the resulting series is date match merged to the destination page.
Note that this is the behavior is the same as performing a general match merge using “@DATE” as the identifiers, with a mean contraction.
• If the source page is an ordinary regular frequency page, and the destination is a regular frequency panel, we frequency convert to the destination frequency, then date match merge to the corresponding dates in each ID in the destination. Thus given source value is repeated for all matching dates in the destination page (i.e., all IDs will have the same time-series).
In all three of these cases, all of the high-to-low conversion methods are supported, but low-to-high frequency conversion only offers (repeating of the low frequency observations).
Lastly, frequency conversion involving a panel page with more than one dimension or an undated page will be performed using raw data copy unless you elect to employ general match merging, as described in
“Panel links with date matching”.
An example will illustrate the general approach. Suppose again that we are working with the regular frequency, quarterly panel REVENUE data. For convenience, we repeat the data here:
| | |
1 | 2002Q1 | 120 |
1 | 2002Q2 | 130 |
1 | 2002Q3 | 150 |
1 | 2002Q4 | 105 |
1 | 2003Q1 | 100 |
1 | 2003Q2 | 125 |
1 | 2003Q3 | 200 |
1 | 2003Q4 | 170 |
2 | 2002Q1 | 40 |
2 | 2002Q2 | 40 |
2 | 2002Q3 | 50 |
2 | 2002Q4 | 35 |
2 | 2003Q1 | 20 |
2 | 2003Q2 | 25 |
2 | 2003Q3 | 50 |
2 | 2003Q4 | 40 |
We now wish to use frequency conversion to link these data into an annual panel, using the average frequency conversion method to go from high-to-low frequency. Then the panel-to-panel frequency conversion will simply perform a frequency conversion for each firm.
| | |
1 | 2002 | 505 |
1 | 2003 | 595 |
2 | 2002 | 165 |
2 | 2003 | 135 |
Here we have performed an ID specific sum as our high-to-low conversion method. The observation for FIRM 1 in YEAR 2002 in the annual page is simply the sum of the quarterly data for that firm in 2002. Similarly, the value for FIRM 2 in 2003 is the sum of the four quarterly values for FIRM 2 in 2003.
It is important to note that these results differ significantly from the results obtained by general match merging using date processing of matches (with “@DATE” and “@DATE” as the two identifiers). Using the latter approach, we would have obtained:
| | |
1 | 2002 | 670 |
1 | 2003 | 730 |
2 | 2002 | 670 |
2 | 2003 | 730 |
While these results may at first seem a bit odd, they simply follow the logic of the discussion in
“Panel links with date matching”. Note that our link which matches dates between the two panel workfile pages is an example of a many-to-many match merge, since there are multiple IDs with the same dates in each page. Thus, we will first contract across IDs to obtain a unique time series in the original frequency, then frequency convert, then one-to-many match in the destination page. In this case, the initial contraction involves summing over firms to obtain a quarterly time series, then frequency converting (summing) to the destination frequency to obtain annual values for 2002 (670) and 2003 (730). The final step match merges these converted values into the annual panel using a one-to-many match merge rule.