There are a number of different approaches to calculating relative importance analysis, this article will briefly describe an alternative method - Partial Least Squares.
Partial Least Squares (PLS) is a popular method for relative importance analysis in fields where the data typically includes more predictors than observations. It is a dimension reduction technique with some similarities to principal component analysis. The predictor variables are mapped to a smaller set of variables and within that smaller space, we perform a regression against the outcome variable. In contrast to principal component analysis where the dimension reduction ignores the outcome variable, the PLS procedure aims to choose new mapped variables that maximally explain the outcome variable.
Requirements
- Open a document in Displayr.
- Load some data into the document. In this example, we are going to load the data using Anything > Data > Data Set > Add > URL and paste in this link: https://wiki.q-researchsoftware.com/images/6/69/Stacked_Cola_Brand_Associations.sav.
Method
- Drag the variable Q6 (Brand Preference) onto the Page from the Data Sets tree on the left to create a table.
- This produces a table showing the breakdown of the respondents by category. This includes a Don't Know category that doesn't fit in the ordered scale from Love to Hate. To remove Don't Know, click on Q6 in the Data Sets tree, then on the right-hand side of the screen, in the object inspector under Properties > DATA VALUES, click on Missing Values. Change Missing Values for the Don't Know category to Exclude from analyses, which produces the table below:
- Restructure the variable to be Numeric by selecting Q6 in the Data Sets tree, and from the object inspector, go to Properties > GENERAL > Structure > Numeric, which produces a table that looks like this:
- To create the PLS model, select Calculation > Custom Code, click onto the page to place the custom calculation, and paste the following snippet into the R CODE in the object inspector under Properties > R CODE.
dat = data.frame(Q6_, Q5_0_, Q5_1_, Q5_2_, Q5_3_, Q5_4_, Q5_5_, Q5_6_, Q5_7_, Q5_8_,
Q5_9_, Q5_10_, Q5_11_, Q5_12_, Q5_13_, Q5_14_, Q5_15_, Q5_16_, Q5_17_,
Q5_18_, Q5_19_, Q5_20_, Q5_21_, Q5_22_, Q5_23_, Q5_24_, Q5_25_, Q5_26_,
Q5_27_, Q5_29_, Q5_28_, Q5_30_, Q5_31_, Q5_32_, Q5_33_)
library(pls)
library(flipFormat)
library(flipTransformations)
dat = AsNumeric(ProcessQVariables(dat), binary = FALSE, remove.first = FALSE)
pls.model = plsr(Q6_ ~ ., data = dat, validation = "CV")
The first line selects Q6_ as the outcome variable (strength of preference for a brand) and then adds 34 predictor variables, each indicating whether the respondent perceives the brand to have a particular characteristic. In your project, these variables can be dragged across from the Data Sets tree on the left into the R CODE window rather than typing them in one by one.
Next, the 3 libraries containing useful functions are loaded. The package pls contains the function to estimate the PLS model, and Displayr's publicly available packages, flipFormat, and flipTransformations are included to help transform and tidy the data. Since the R pls package requires inputs to be numerical, I converted the variables from categorical. In the final line above the plsr function does the work and creates pls.model. - Adding the following lines recreates the model with the optimal number of dimensions:
# Find the number of dimensions with lowest cross validation error
cv = RMSEP(pls.model)
best.dims = which.min(cv$val[estimate = "adjCV", , ]) - 1
# Rerun the model
pls.model = plsr(pref ~ ., data = dat, ncomp = best.dims)You will need to replace pref on the last line of code with your outcome variable. In this example, I used Q6_ as the outcome variable.
- Finally, extract the useful information and format the output by adding the following lines of code:
coefficients = coef(pls.model)
The regression coefficients are normalized so their absolute sum is 100. The labels are added and the result is sorted.
sum.coef = sum(sapply(coefficients, abs))
coefficients = coefficients * 100 / sum.coef
names(coefficients) = TidyLabels(Labels(dat)[-1])
coefficients = sort(coefficients, decreasing = TRUE)
The results below show Reliable and Fun are positive predictors of preference, Unconventional and Sleepy are negative predictors, and Tough has little relevance.