How to Perform Column Comparisons with Missing Repeated Measures Data

This article describes alternative approaches to statistical testing with missing data when repeated measures are tested against one another. Repeated measures data is data where a respondent has provided two or more evaluations and there is a need to compare them (e.g., ratings of satisfaction at different points in time, ratings of the appeal of two different products). See examples:

Method - Example 1: When there is lots of missing data
Method - Example 2: When there is some missing data

After reviewing the examples it is also useful to read through the Understanding the logic of Displayr section as to why these examples work as they do.

Requirements

A data file with repeated measures data with missing data issues in the repeated measures. Typically, such data appears in Displayr in a Nominal - Multi question, a Binary - Grid, or a Numeric - Grid.

Method - Example 1: When there is lots of missing data

In this example, there is a lot of missing data, and no respondents have seen both A and B. Consequently, because Displayr defaults to testing only respondents with complete data, no tests are performed (as indicated by the -). In this example, conducting an independent-samples test is likely the correct approach (see the instructions below).

Independent samples tests

An alternative approach, which is generally better in this example, where there are large amounts of missing data, is to have Displayr ignore the repeated-measures nature of the data and directly compare the percentages, assuming they come from independent samples. There are a variety of ways of doing this.

Note that applying independent-samples tests assumes that the data are missing completely at random (this is discussed in more detail in the next section). And it also involves ignoring the dependence in the data, thereby reducing statistical power.

The most straightforward approach to conducting independent samples tests in Displayr is to:

Select the table or tables in which you wish to modify the statistical testing assumptions.
Select Object Inspector > Appearance > Significance > Advanced.
On the Test Type tab, change the Proportions test > Survey Reporter Proportions or, if testing means, Survey Reporter Means.
On the Column Comparisons tab, change Overlaps > Independent.
Click Apply to Selection.

An alternative approach is to use the Rule called Significance Testing in Tables - Independent Samples Column Means and Proportions Tests. This approach offers greater flexibility, but it is more complex to use, increasing the risk of user errors.

Method - Example 2: When there is some missing data

The table below shows that a higher proportion of respondents said they would buy Product B (20%) than Product A (16%). But the column comparisons tell the opposite story, with Product A shown to have a significantly higher Buy score than Product B. While this result seems to be a paradox, it is not an error and is instead a consequence of a serious missing data problem.

The Column n at the bottom of the table shows that the sample size in Product A is 142, compared to 191 for B. When the data is filtered to include only respondents who have complete data, we get the table below. This is the table that Displayr has used in the "background" when performing the test of columns A and B. Note that with this table, we can see that the first column's score is higher than that of the second column (16% versus 12%).

Dependent samples tests

Dependent samples tests are statistical tests explicitly designed for the problem in this example. Two notes of caution about dependent samples tests:

They are not recognized in the statistical literature. That is, while a number of market research software programs provide these tests, there is no body of published work supporting their validity.
The tests assume that the data is missing completely at random and, as this assumption is often not appropriate in survey research, these tests should generally not be applied without first checking this assumption. A way to check the assumption is to see if the responses of people with missing data are systematically different to those without, as done above (and, in the case of Example 2, applying a dependent test is not appropriate).

Dependent samples tests are run in Displayr as follows:

Select the table or tables in which you wish to modify the statistical testing assumptions.
Select Object Inspector > Appearance > Significance > Advanced.
On the Test Type tab, change the Proportions test > Survey Reporter Proportions or, if testing means, Survey Reporter Means.
On the Column Comparisons tab, change Overlaps > Dependent.
Click Apply to Selection.

Technical details

Understanding the logic of Displayr

When faced with missing repeated measures data, some people prefer to conduct testing using the numbers shown in the tables. That is, in the case of Examples 1 and 2 above, Displayr should just compare the numbers shown. There are a number of reasons why Displayr does not do this and instead filters the data:

Consistency
Filtering the data is the orthodox solution in statistical testing
It increases the user's chance of detecting a problem

Consistency

When there is no missing data, Displayr performs the standard repeated measures tests. Consequently, it would be confusing if Displayr did something different when some missing data exists.

Filtering the data is the orthodox solution in statistical testing

With repeated measures data such as this, the orthodox statistical treatment of the data is to perform testing only using respondents that have no missing data. Examining the data from Example 2 gives some insight into why this is the orthodox approach.

Looking at the data for Product B in Example 2, 20% of 191 Buy, whereas in the second table 12% of the 140 Buy. From this, we can deduce that the 51 respondents with missing data were, on average, much more likely to have said they would buy. The following table compares the Buy and Not buy data for product B according to whether or not there is any missing data, and shows that 41% of those with some missing data said Buy.

Thus, the 20% shown on the original table is a weighted average of two very different groups' data: the group with missing data who have given much higher Buy ratings, and the people who evaluated both products, who gave lower ratings for Product B. By default, Displayr ignores the data from those with missing data. There are a number of methodological justifications for this:

The data from respondents who have seen both products may be more reliable than data from respondents who have only seen one. This is, of course, only a conjecture.
There are no widely-recognized statistical tests that take into account the missing data. This is discussed in more detail below.
Filtering the data and employing a repeated measures test provides more statistical power than using an independent samples tests (independent samples tests are discussed above).
Where there is missing data, a valid analysis requires assumptions to be made about the causes of the missing data. Some possible assumptions and their relevant implications are:
- Data that is missing can be considered to be missing completely at random. If this assumption is correct, then it is safe to ignore the respondents with missing data (although statistical power may be reduced), as is done by default by Displayr.
- Respondents with missing data are intrinsically different (i.e., the missing data is not ignorable). In surveys, this is often the case (e.g., people with missing data may be people who have missing data because they are less experienced in a category). If this is the case, a valid analysis would need to involve estimating how the 51 respondents with some missing data would have evaluated each of Product A and Product B. That is, if the missing data is not ignorable, it means that it is invalid to compare the original numbers in the first table (20% versus 16%) because the 16% is a biased estimate due to not taking into account the missing data.

It increases the user's chance of detecting a problem

A criticism of the way that Displayr performs the testing is that "effectively the numbers show one thing but the significance testing shows another". This is an accurate description of how Displayr works, but the testing with Displayr was deliberately written to achieve this outcome. The initial table in Example 2 shows the results of 20% and 16% because these are the numbers in the data. However, the test that is shown is Displayr's attempt to produce the best test possible given the data. Were Displayr to instead provide a test consistent with the data, it would mean that the test would, more often than not, be invalid. Further, there would be no way for anybody reading the table to identify the problem, as the numbers would appear to make sense. By contrast, by providing the paradoxical result, the user is able to identify that something is wrong and investigate the problem further.

A second cue to the existence of the missing data problem is that Displayr shows a range of the missing data in the caption of the table (i.e., base n = from 142 to 191).

How to Apply Significance Testing in Displayr

How to Investigate Your Statistical Significance Testing

Articles in this section

Requirements

Method - Example 1: When there is lots of missing data

Independent samples tests

Method - Example 2: When there is some missing data

Dependent samples tests

Technical details

Understanding the logic of Displayr

Consistency

Filtering the data is the orthodox solution in statistical testing

It increases the user's chance of detecting a problem

Next

Articles in this section

Requirements

Method - Example 1: When there is lots of missing data

Independent samples tests

Method - Example 2: When there is some missing data

Dependent samples tests

Technical details

Understanding the logic of Displayr

Consistency

Filtering the data is the orthodox solution in statistical testing

It increases the user's chance of detecting a problem

Next

Related articles