This article describes how to do a driver analysis in Displayr and create the outputs. Driver analysis, which is also known as key driver analysis, importance analysis, and relative importance analysis, uses the data from questions to work out the relative importance of each of the predictor variables in predicting the outcome variable. There are various driver analysis methods available that you can use. For more detail about which method to use and when, why not take a look at our driver analysis webinar and eBook.
A data set containing the variables that you want to use as inputs to the driver analysis. Often driver analysis is performed using data for multiple brands at the same time. Traditionally, this is addressed by creating a new data file that stacks the data from each brand on top of each other (see What is Data Stacking?). However, when performing driver analysis in Displayr, the data can be automatically stacked.
- Load a data set that contains the variables that you will use as inputs for the driver analysis.
- From the ribbon, select Anything > Advanced Analysis > Regression > Driver Analysis.
- Select the Outcome variable and the Predictor(s) variables.
- Select the Algorithm you want to use from the drop-down.
- Select the Regression type from the drop-down (see: How to Select the Regression Type for Driver Analysis).
- All the widely used methods for driver analysis are available in Displayr. They can be accessed by toggling the Output in the object inspector under Inputs > Linear Regression.
Correlation: This method is appropriate when you are unconcerned about correlations between predictor variables.
Jaccard Coefficient: Note that Jaccard Coefficient is only available when Regression type is set to Linear. This is similar to correlation, except it is only appropriate when both the predictor and outcome variables are binary.
Generalized Linear Models (GLMs): These include Linear, Binary, Logit, Ordered Logit, etc. and address correlations between the predictor variables, and each of the different methods is designed for different distributions of the outcome variable (eg linear for a numeric outcome, binary logit for a two-category outcome, ordered logit for ordinal output).
Shapley Regression: Note that Shapley Regression is only available when Regression type is set to Linear. This a regularized regression, designed for situations where linear regression results are unreliable due to high correlations between predictors.
Johnson's Relative Weight: Note that this appears when Output is set to Relative Importance Analysis. As with Shapley Regression, this is a regularized regression, but unlike Shapley, it is applicable to all Regression type settings (e.g., ordered logit, binary logit).
- Check the Stack data option.
- By default, all the driver analysis methods exclude all cases with missing data from their analysis (this occurs after any stacking has been performed). However, there are two additional Missing data options that can be relevant:
- If using Correlation, Jaccard Coefficient, or Linear Regression, you can select Use partial data (pairwise correlations), in which case the data is analyzed using all the available data. Even when not all the predictors have data, partial information is used for each case.
- If using Shapley Regression, Johnson's Relative Weights (Relative Importance Analysis), or any of the GLMs and quasi-GLMs, Multiple imputations can be used. This is generally the best method for dealing with missing data, except for situations the Dummy variable adjustment is appropriate.
- If using Shapley Regression, Johnson's Relative Weights (Relative Importance Analysis), or any of the GLMs and quasi-GLMs, Dummy variable adjustment can be used. This method is appropriate when the data is missing because it cannot exist. For example, if the predictors are ratings of satisfaction with a bank's call centers, branches, and website, if data is missing for people that have not attended any of these, then this setting is appropriate. By contrast, if the data is missing because the person didn't feel like providing an answer, multiple imputations are preferable.
OPTIONAL: Apply a filter if you want to create a segmentation for a specific subgroup.
OPTIONAL: Select a weight if you want the input variables weighted.
All driver analysis methods have an option called Crosstab interaction, where a categorical variable can be selected, and the result is a crosstab that shows the importance scores by each unique value of the categorical variable, with bold showing significant differences and color-coding showing relativities.