How to Interpret Column Sample Size with Missing Data

This article describes how to interpret Column Sample Size with Missing Data.

The bottom row of the table above shows Column Sample Size selected from Statistics > Below. At first glance, the numbers appear to be wrong. There is clearly data in April, but the table shows a Column Sample Size of 0, which, on face value, makes no sense.

The Column Sample Size statistic shown in the cells (i.e., from Statistics > Cells) reveals the cause of the problem. The first statement was not asked in April. So its base size (i.e., Column Sample Size) is 0. The second statement was asked of 522 people in April. The third statement was asked of 506 people. Thus, in this example, there is no correct base size that accurately reflects all of the data in the first column (i.e., each of 0, 522, and 506 is correct for one and only one row).

In the fourth row, the NET is shown. It shows a Column % of NaN and a Column Sample Size of 0. The way that this is computed is that it represents all of the people who have data in one or more of the rows above, but excludes all people with missing data. As everybody has missing data for the first row, the base is 0, so it is not possible to compute a percentage. Looking at the Column Sample Size in the bottom row, we can see that it is different again. It is 0 in the same place as with the NET row, but has a lower value in June. This is because in this example, some of the rows of the table have been hidden, and the NET has been computed using only the non-hidden rows.

Method

There are a number of "remedies" to this problem:

Do not show Column Sample Size in Statistics > Below on tables that contain missing values. This is the recommended approach, as Column Sample Size is misleading with such data.
Apply a rule to your table that displays the maximum sample size for each column via Data > Rules > Plus (+) > Calculation > Calculate Maximum Column Sample Size in 'Statistics - Below'.
Create a Custom Rule using whatever logic is appropriate for computing the column n in your given situation.

How to Impute Missing Values in Displayr

How to Replace Missing Values with Their Average

How to Rebase Multiple Response Data in Variable(s) to NET

Articles in this section

Method

Next

Articles in this section

Method

Next

Related articles