How to Do the Statistical Analysis of Choice-Based Conjoint Data

This article describes how to create a choice-based conjoint model using discrete choice experiment data in Displayr.

chocolate conjoint results hb.png

Requirements

A document containing responses from a choice-based modeling data set. There needs to be a variable designating the design version, task number, and respondent's choice, similar to the format below:

conjoint respondent data format example.png

The corresponding choice model design (see list of formats listed below) which explains the choices shown. This outlines the values for each attribute for each alternative shown for each task for each version, similar to the format below:
Note that some software will format conjoint data differently than above, and you may need to reformat in order to make it compatible. There is a built-in automation that will do this for Alchemer data, which combines the design with responses, see How to Convert Alchemer Conjoint Data for Analysis in Displayr. This automation will add the appropriate questions containing the choices and the design version in the respondent data set.

Method

1. From the toolbar (if on a Page) or hovering over an output in the Report tree, click Plus (+) > Advanced Analysis > Choice Modeling and select one of the following models:

Hierarchical Bayes - this model is more flexible in modeling the characteristics of each respondent and tends to produce a model that better fits the data
Latent class analysis - to be used when you want a segmentation of respondents
Multinomial logit - equivalent to single-class latent class analysis

A new R output called choice.model will appear on your page.

2. From Properties, select one of the following options for the Experimental Design > Design source:

Data set - select variables from a data set to specify the design. Variables need to be supplied corresponding to the version, task, and attribute columns of a design. See here for an example.
Experimental design R output - select an R output in the project to supply the choice model design (created using Plus (+) > Advanced Analysis > Choice Modeling > Experimental Design or in the toolbar or on hover in the Report tree.).
Sawtooth CHO format - supply the design using a Sawtooth CHO file. You'll need to upload the CHO file to the project as a data set (first rename it to end in .csv instead of .cho) so that Displayr can recognize it. The new data set will contain a text variable, which should be supplied to the CHO file text variable input.
Important: The .csv file needs to be uploaded to the cloud drive and then added to the project from there. Displayr won't allow a direct upload of a text file
Sawtooth dual file format - supply the design through a Sawtooth design file (from the Sawtooth dual file format). You'll need to upload this file to the project as a data set. The version, task, and attributes from the design should be supplied to the corresponding inputs (similar to the Data set option).
JMP format - supply the design through a JMP design file. You'll need to upload this file to the project as a data set. The version, task, and attributes from the design should be supplied to the corresponding inputs (similar to the Data set option).
Experiment variable set - supply the design through an Experiment variable set in the project.

3. When Data set, Sawtooth dual file format, or JMP format are selected, choose the variables from your design data set containing the Version, Task, and Attributes.

conjoint experimental design data selections.png

Note, if you are working with an Alchemer (formerly SurveyGizmo) data set, the ResponseID from the conjoint data set is used as Version and Set Number as Task.

Alternative-specific designs are supported in Attributes (attributes that do not apply to an alternative are coded as a 0). Any alternatives for which all of the values are missing are identified as 'None of these' alternatives and will have coefficients estimated as an alternative-specific constant with the label None of these.

4. You'll also need to provide attribute levels through a spreadsheet-style data editor for most of these options. To enter the attributes, select Enter attribute levels and enter the attribute name and levels in each column:

Note that this is optional for the JMP format if the design file already contains attribute-level names. The levels are supplied in columns, with the attribute name in the first row and attribute levels in subsequent rows.

5. Code some categorical attributes as numeric - Whether to treat some categorical attributes as numeric. If checked, a text box will appear below to allow the attribute and numeric coding to be specified as a comma-separated list, e.g., Weight, 1, 2, 3, 4. When one text box is filled, another text box will appear for another attribute to be specified.

6. Next, you'll need to select the Respondent Data. Whether respondent data needs to be explicitly provided depends on how you supplied the design in the previous step. If an Experiment Question or CHO file was provided, there is no need to separately provide the data, as Experiment Questions and CHO files already contain the choices made by the respondents.

For the other methods of supplying the design, the respondent Choices and the Tasks or Version corresponding to these choices need to be provided from variables in the project. Each variable corresponds to a question in the choice experiment, and the variables need to be provided in the same order as the questions.

Note the following:

If you have a 'None of these' option, you will need to code with the index that the 'None of these' option appears in the design. For example, if 'None of these' is the fifth option shown, then 'None of these' should get a value of 5 for the relevant variables in your data set. See Variable Sets for how to confirm and modify your data.
If you have a dual-response 'none' design, the following requirements apply:
- The variables with the dual-response 'none' data must be a Binary - Multi variable set.
- In Properties > Data > Attributes > Categories, Count This Value must be selected for the category which indicates that the respondent would purchase their selected choice.
- The response categories that indicate the respondent would purchase their selected choice and the one that indicates the respondent would not purchase their selected choice should also be set to Include in analyses in the Missing Data column.
- You will additionally need to select the corresponding 'Yes/No' questions in the Dual-response 'none' choice field of the choice model output.
Note that if your conjoint data comes from Alchemer, see How to Convert Alchemer Conjoint Data for Analysis in Displayr. Displayr will then add the appropriate questions containing the choices and the design version in the respondent data set.
Instead of using respondent data, there is also an option to use simulated data by changing the Data source setting to Simulated choices from priors. See this blog post for more information on using simulated data.

7. If Sawtooth CHO Format was selected as the Design source, select the Respondent IDs, which is a variable containing respondent IDs corresponding to those in the CHO file.

8. If Experimental Design is selected as the Data source, choose the Prior source - between using priors from the choice model design output or manually entering the priors. If the design output contains no priors, prior means and standard deviations of 0 are assumed.

9. Enter the Simulated sample size - the number of simulated respondents to generate.

10. Dual-response 'none' choice - (Optional) Variables indicating dual-response 'None of these' choices. Should be the same number of variables as Choices and Tasks. These variables should be combined as a Binary - Multi variable set, with the category that indicates the respondent would purchase their selected choice being selected as Count this value in the Value Attributes.

11. Select one of the following options from the Missing data input, which determines how Displayr will deal with missing data, if any:

Use partial data is the default setting, which ignores questions with missing data but keeps other questions for analysis
Exclude cases with missing data removes respondents from the analysis if any of the selected questions contain missing data
Error if missing data shows an error message if any respondents have missing data on any of the selected questions

12. In the Model section, if Latent Class Analysis or Hierarchical Bayes is selected as the model Type, enter the Number of classes you want the model to create.

13. OPTIONAL: Enter a value for Questions left out for cross-validation. If there are too many classes, the computation time will be long, and the model may overfit the data. To determine the amount of overfitting in the data, set Questions left out for cross-validation to be greater than the default of 0. This will allow you to compare the output's in-sample and out-of-sample prediction accuracies.

14. Tick Alternative-specific constants to include alternative-specific constants in the model.

15. Indicate the Seed, which is the random seed used to determine the random initial parameters of the model and also used to determine the random questions to leave out for cross-validation. The default is 123.

15. Indicate the number of Iterations used in the Hierarchical Bayes analysis.

14. All other options are more advanced and detailed below. These can be left at their default values. For more information, see Checking Convergence When Using Hierarchical Bayes for Conjoint Analysis and How to Improve Choice Model Accuracy Using Covariates.

Respondent-specific covariates Variables containing respondent-specific covariates to be included in the model.
Chains The number of chains used in the Hierarchical Bayes analysis.
Maximum tree depth The maximum tree depth parameter. Only increase this if warnings about "tree depth" are shown.
Adapt delta The maximum adapt delta parameter. Only increase this if warnings about "low adapt delta" are shown.

10. OPTIONAL: Apply a filter to the model by selecting a filter variable from the Filter(s) input at the top of the Properties.

11. OPTIONAL: Apply a weight to the model by selecting a weight variable from the Weight input at the top of the Properties.

12. Press the Calculate button to run the model.

The following options are also available once the model has run:

Diagnostics

Parameter Statistics Table

Utilities Plot

Save Variable(s)

Individual-Level Coefficients

Proportion of Correct Predictions

RLH (Root Likelihood)

Utilities

Technical Details

An R package called flipChoice is used to run the Hierarchical Bayes analysis. flipChoice uses rstan to fit the underlying Bayesian statistical model, which is itself an R interface for Stan.

Adaptive Choice Models

Please note that the choice modeling analysis tools do not support adaptive choice-based conjoint experiments. Such experiments develop the design of future choice tasks based on previous respondent answers and can involve multiple styles of questioning. Thus, while it may be possible to manually reconstruct the design in a way that is compatible with the choice modeling tools, it is unclear whether such designs are consistent with the assumptions of the analysis methods used by these tools.

Additional Properties

When using this feature, you can obtain additional information that is stored by the R code that produces the output.

To do so, click Calculation > Custom Code in the toolbar and click on the page where you wish to place the calculation.
In the code panel, paste: item = YourReferenceName
Replace YourReferenceName with the reference name of your item. Find this in the Report tree or by selecting the item and then going to General > General > Name from Properties .
Below the first line of code, you can paste in snippets from below or type in str(item) to see a list of available information.