## Introduction

This article describes how to interpret Column Sample Size with Missing Data.

The bottom row of the table above shows **Column Sample Size **selected from STATISTICS > Below. At first glance, the numbers appear to be wrong. There is clearly data in `April`, but the table shows a Column Sample Size of 0, which on face value makes no sense.

The **Column Sample Size **statistic shown in the cells (i.e., from STATISTICS > Cells) reveals the cause of the problem. The first statement was not asked in April. So its base size (i.e., **Column Sample Size**) is clearly 0. The second statement was asked of 522 people in April. The third statement was asked of 506 people. Thus, in this example, there is clearly no correct base size which accurately reflects all of the data in the first column (i.e., each of 0, 522 and 506 is correct for one and only one row).

In the fourth row the NET is shown. It shows a Column % of NaN and a Column Sample Size of 0. The way that this is computed is that it represents all of the people who have data in one or more of the rows above, but excludes all people with missing data. As everybody has missing data for the first row, the base is 0, so it is not possible to compute a percentage. Looking at the Column Sample Size in the bottom row, we can see that it is different again. It is 0 in the same place as with the NET row, but has a lower value in `June`. This is because in this example, some of the rows of the table have been hidden, and the NET has been computed using only the non-hidden rows.

## Method

There are a number of "remedies" to this problem:

- Do not show
**Column Sample Size**in STATISTICS > Below on tables that contain missing values. This is the recommended approach, as**Column Sample Size**is misleading with such data. - Apply a rule to your table that displays the maximum sample size for each column via Properties > RULES > Plus (+) > Modify Whole Table or Plot > Show Maximum Column Sample Size in Statistics Below
**.** - Create a Custom Rule using whatever logic is appropriate for computing the column n in your given situation.

## See Also

How to Impute Missing Values in Q