Often, the quality of a design is described in terms of its balance and overlap. Balance is a measure of consistency of the frequencies of the attribute levels. Overlap is a measure of repetition of attribute levels within the same question.
However, the drawback of these measures is that they produce many statistics that are difficult to interpret in isolation. In order to develop an understanding of how good your design is, you must look at these statistics as part of the bigger picture. I'll show you how to derive diagnostic metrics that provide a holistic measure of the quality of your design.
An Experimental Design
- Select your Experimental Design
- Select Anything > Advanced Analysis > Choice Modeling > Diagnostic > Experimental Design > Balances and Overlaps of Design
The output consists of the following:
- d-error D-error is a measure that quantifies how good or bad a design is at extracting information from respondents in an experiment. A lower D-error indicates a better design. It is usual to compare d-errors for designs created using different algorithms, rather than consider the number by itself.
- overlaps A vector where each element is the percentage of questions with some overlap (repetition of a level) by attribute. The number of levels of each attribute are shown in brackets. For example 70% of the questions have at least one repeated Color level.
- The following 4 values are stated as a score between 0 and 1 where 1 represents perfect balance and 0 is the worst possible design. Although they can be interpreted in isolation, it is generally useful to compare scores for designs created by different algorithms.
- All values use the concept of balance. The balance of an attribute is defined as the sum across levels of the absolute differences between the counts and the average count. This value is then scaled to give a value of 1 if all absolute differences are zero (i.e. the count of every level is the same as its mean, which implies perfect balance). The worst design consists of repeating the same levels for all alternatives and has a balance of zero. By replacing the sum across levels with the sum across pairs of levels of different attributes we arrive at an analogous pairwise balance. These values are not calculated for Alternative specific and Partial profiles designs.
- mean.version.balance The average balance of all versions, then averaged across all attributes.
- mean.version.pairwise.balance The average pairwise balance of all versions, then averaged across all pairs of attributes.
- across.version.balance The balance for the whole design across all versions, then averaged across all attributes.
- across.version.pairwise.balance The pairwise balance for the whole design across all versions, then averaged across all pairs of attributes.
- singles The counts of the occurrence of each level of each attribute across all versions of the design. For example, the level Blue occurs 12 times across the design whereas Yellow occurs only 10 times. For Partial profile designs, the number of times that the levels are constant are also shown.
- pairs The counts of the co-occurrence of a pair of levels from different attributes within the same alternative. This is calculated across all versions of the design. For example, Red occurs 5 times in the same alternative as Medium but only once with Slow. In this example, the uneven distribution of singles and pairs balances are a consequence of the Random design. It is generally better for the levels to be evenly balanced.