User’s Guide : Basic Data Analysis : Series : Duplicates Analysis
Duplicates Analysis
One thankless data task is examining data to find miscoding or other errors. One important approach to data cleaning is identification and examination of observations with duplicates.
EViews offers provides easy-to-use tools for analyzing your series or group data to identify duplicate observations. Specialized tools make it easy you to work with and edit groups of repeated observations. A newly developed interactive display let you jump from looking at observations in a single duplicate group to the observation in workfile context, and vice versa. Thus, clicking on a duplicate in the spreadsheet view will jump to show all of the observations that share that duplicate. Similarly, clicking on an observation in the shared individual duplicates view will jump to the corresponding observation in the full spreadsheet.
To display the new duplicates view, click on View/Duplicate Observations from the main menu of either a series or a group object. EViews will display the duplicates summary associated with your data in the current workfile sample:
Here we see the summary associated with the series GPA. The summary shows that there are 26 unique observations, and 3 sets of non-unique observations, each set comprised of a pair of observations for 6 total obs. Also displayed are the percentages in each category.
While the summary view is useful for obtaining an overview of the duplicates in the data, the real power of the duplicates view comes from clicking on one of the other items in the left-hand side tree structure.
Clicking on the Graph node displays a graph of the data showing the group sizes associated with each observation:
Here we see which six observations are in duplicate groups. There does not appear to be a pattern to the location of these observations.
Clicking on the Spreadsheet node shows the data in a spreadsheet:
Observations which have duplicates are colored and shaded, with the intensity of both determined by the number of observations in the corresponding group. In this case, all of the duplicates have 2 observations in their group but in cases where there is variation in group sizes, the spreadsheet will identify which observations are in larger groups.
As with the standard EViews spreadsheet display, you may click on the Edit +/– button to enable editing of the observations.
Clicking on the Duplicates/Count node opens up the tree to show all of the duplicate groups. Clicking on a specific group takes you to a display of the observations in that group:
Again, clicking on the Edit +/– button will enable editing of the observations.
After examining the observations in a specific group, you may wish to see that observation in workfile context. If you move your pointer over the observation row identifier or value you will see the pointer arrow change to a target, indicating that clicking will target that observation. Simply click on the observation data to jump to the duplicates spreadsheet display with that observation selected.
Similarly, if you are in the spreadsheet display, clicking on a duplicates observation row identifier or value will jump to the observation’s specific group display.
Note that the same tools are available for group objects. In this case, duplicates refers to observations for which the values of all of the series in the group are identical.