Sometimes you may add the Column Sample Size (or other Sample Size statistic) to the Statistics > Below or to the Right on a table Under Object Inspector > Data > Statistics, and get an unexpected result like a 0 or a number much smaller than expected. This is because these statistics are basically a NET of the sample sizes for the cells in that row/column. That is, the sample size will only count cases that are included in all cells in the row/column, meaning missing data will be excluded from the final value. This article describes how to interpret Column Sample Size with missing data, as seen in the table below, which shows a different statement in each row cut by different work statuses and dates in the columns.
The bottom row of the table above shows Column Sample Size selected from Object Inspector > Data > Statistics > Below. At first glance, the numbers appear to be wrong. There is clearly data in April for the second and third row, but the table shows a Column Sample Size of 0, which, on face value, makes no sense.
The Column Sample Size statistic shown in the cells (i.e., in Object Inspector > Data > Statistics > Cells) reveals the cause of the problem. The statement in the first row was not asked in April. So its base size (i.e., Column Sample Size) is 0. The second statement was asked of 522 people in April. The third statement was asked of 506 people. Thus, in this example, there is no correct base size that accurately reflects all the data in the first column (i.e., each of 0, 522, and 506 is correct for exactly one row).
In the fourth row, the NET is shown. It shows a Column % of NaN and a Column Sample Size of 0. This is computed by representing all people who have data in one or more of the rows above, but excluding those with missing data. As everybody has missing data for the first row, the base is 0, so it is not possible to compute a percentage. Looking at the Column Sample Size in the bottom row, we can see that it is different again. It is 0 in the same place as with the NET row, but has a lower value in June. This is because, in this example, some rows of the table have been hidden, and the NET has been computed using only the non-hidden rows.
Method
There are a number of "remedies" to this problem:
- Do not show Column Sample Size in Statistics > Below on tables that contain missing values. This is the recommended approach, as Column Sample Size is misleading in this context.
- Apply a rule to your table that displays the maximum sample size for each column via Object Inspector
> Data > Rules > Plus (+) > Calculation > Calculate Maximum Column Sample Size in 'Statistics - Below'.
- Create a Custom Rule using whatever logic is appropriate for computing the column n in your given situation.
Next
How to Impute Missing Values in Displayr
UPCOMING WEBINAR: The Roadmap for Market Researchers in the Age of AI