## Comparing groups where a group has been over-recruited

Consider a simple example. Let us say a survey was designed to compare the attitudes of indigenous with non-indigenous Australians, which represent, respectively, 5% versus 95% of the Australian population. Such a study would generally employ *non-proportional stratification*, over-recruiting the indigenous Australians. For example, the study may be designed so that the indigenous Australians represent 50% of the sample (500).

The reason for using such a non-proportional sample design is because we are more likely to find a significant difference if comparing a sample of 500 indigenous Australians with a sample of 500 non-indigenous Australians than if comparing a sample of 50 indigenous Australians with a sample of 950 non-indigenous Australians.

If conducting such an analysis, the effective sample size will be greater than 100% as due to the non-proportional sampling the sampling error is smaller than if simple random sampling has been conducted (i.e., which would have involved a sample of 50 indigenous Australians). Note that this is the intuitively sensible result: it is consistent with the motivation for over-recruiting indigenous Australians in the sample.

## Where *strata* have different variances

Where different *strata* of a sample have different variances for a statistic that is being estimated then it is *optimal* to over-recruit respondents in the groups with the higher variances (this is referred to as *Neyman allocation* in the statistics literature). Thus, where a sample is recruited such that there is over-recruitment of groups with higher variances then this can lead to an effective sample size of more than 100.

## Comparison to other programs

Other software designed for taking sampling designs into account will also produce effective sample sizes that exceed 100% of the actual sample size (e.g., IBM's *SPSS Complex Samples*) and the `surveys` package for *R*.

Many of the programs used within the market research industry for analyzing surveys, such as IBM's *Survey Reporter*, instead use Weight Calibration using Kish's Effective Sample Size Formula. It is also used in many analyses in Displayr (however, all crosstabs involving means and proportions in Displayr use *Taylor Series Linearization*). You can use Kish's Effective Sample Size Formula in Displayr by changing the Statistical Assumptions setting of **Weights and significance** to **Kish's approximation**.

## Next

How to Add Rows to a Table to Display Effective Column Sample Size