Often, the quality of a design is described in terms of its balance and overlap. Balance is a measure of consistency of the frequencies of the attribute levels. Overlap is a measure of repetition of attribute levels within the same question.
However, the drawback of these measures is that they produce many statistics that are difficult to interpret in isolation. In order to develop an understanding of how good your design is, you must look at these statistics as part of the bigger picture. This article will show you how to derive diagnostic metrics that provide a holistic measure of the quality of your design.
You can easily apply these metrics to compare designs created from different algorithms.
The function calculates the level balances and overlap between levels for a Choice Modeling - Experimental Design. The D-Error and various diagnostic statistics described below are also shown. This blog post describes the calculations underlying metrics such as mean.version.balance referred to below.
An example design
In Displayr designs are created with Anything > Advanced Analysis > Choice Modeling > Experimental Design. I am using a small design produced with the Random algorithm. There are two attributes (Brand and Price), each of which has three levels. Every respondent answers five questions, each of which contains three alternatives. There are two versions.
Below I show the output of Anything > Advanced Analysis > Choice Modeling > Diagnostic > Experimental Design > Balances and Overlaps. Don't worry if you're confused about what each output means. I'm about to explain them.
- Select your Experimental Design
- Select Anything > Advanced Analysis > Choice Modeling > Diagnostic > Experimental Design > Balances and Overlaps of Design.
The output consists of the following:
- d-error D-error is a measure that quantifies how good or bad a design is at extracting information from respondents in an experiment. A lower D-error indicates a better design. It is usual to compare d-errors for designs created using different algorithms, rather than consider the number by itself.
overlaps A vector where each element is the percentage of questions with some overlap (repetition of a level) by attribute. The number of levels of each attribute are shown in brackets. For example, 70% of the questions have at least one repeated Brand level.
- The following 4 values are stated as a score between 0 and 1 where 1 represents perfect balance and 0 is the worst possible design. Although they can be interpreted in isolation, it is generally useful to compare scores for designs created by different algorithms.
- All values use the concept of balance. The balance of an attribute is defined as the sum across levels of the absolute differences between the counts and the average count. This value is then scaled to give a value of 1 if all absolute differences are zero (i.e. the count of every level is the same as its mean, which implies perfect balance). The worst design consists of repeating the same levels for all alternatives and has a balance of zero. By replacing the sum across levels with the sum across pairs of levels of different attributes we arrive at an analogous pairwise balance. These values are not calculated for Alternative specific and Partial profiles designs.
- mean.version.balance The average balance of all versions, then averaged across all attributes.
- mean.version.pairwise.balance The average pairwise balance of all versions, then averaged across all pairs of attributes.
- across.version.balance The balance for the whole design across all versions, then averaged across all attributes.
- across.version.pairwise.balance The pairwise balance for the whole design across all versions, then averaged across all pairs of attributes.
- singles The counts of the occurrence of each level of each attribute across all versions of the design. For example, the level General Motors occurs 12 times across the design whereas Ferrari occurs only 7 times. For Partial profile designs, the number of times that the levels are constant is also shown.
- pairs The counts of the co-occurrence of a pair of levels from different attributes within the same alternative. This is calculated across all versions of the design. For example, General Motors occurs 5 times in the same alternative as $20k but only three times with $40k. In this example, the uneven distribution of singles and pairs balances are a consequence of the Random design. It is generally better for the levels to be evenly balanced.
Hoare, J. (2018, July 20). How Good is your Choice Model Experimental Design? [Blog post]. Accessed from https://www.displayr.com/how-good-is-your-choice-model-experimental-design/.
See also Choice Modeling - Experimental Design.
For details of the D-error calculation see
- Yap, J. (2018, August 20). What is D-Error? [Blog post]. Accessed from https://www.displayr.com/what-is-d-error/.
- Yap, J. (2018, August 21). How to Compute D-error for a Choice Experiment [Blog post]. Accessed from https://www.displayr.com/how-to-compute-d-error-for-a-choice-experiment/.
- Huber, J., & Zwerina, K. (1996). The importance of utility balance in efficient choice designs. Journal of Marketing research, 307-317. Accessed from https://people.duke.edu/~jch8/bio/publications.htm.