## Introduction

There are a number of different approaches to calculating relative importance analysis, this article will briefly describe an alternative method - Partial Least Squares.

Partial Least Squares (PLS) is a popular method for *relative importance analysis* in fields where the data typically includes more predictors than observations. It is a dimension reduction technique with some similarity to *principal component analysis*. The predictor variables are mapped to a smaller set of variables and within that smaller space, we perform a regression against the outcome variable. In contrast to principal component analysis where the dimension reduction ignores the outcome variable, the PLS procedure aims to choose new mapped variables that maximally explain the outcome variable.

## Requirements

- Open a document in Displayr.
- Load some data into the document. In this example, we are going to load the data using
**Anything > Data > Data Set > Add****> URL**and paste in this link: https://wiki.q-researchsoftware.com/images/6/69/Stacked_Cola_Brand_Associations.sav.

## Method

- Drag the variable Q6 (Brand Preference) onto the
**Page**from the**Data****Sets**tree on the left to create a table. - This produces a table showing the breakdown of the respondents by category. This includes a
*Don't Know*category that doesn't fit in the ordered scale from Love to Hate*.*To remove*Don't Know*, click on Q6 in the**Data Sets**tree, then on the right-hand side of the screen in the**object inspector**under**Properties > DATA VALUES,**click on**Missing Values**. Change**Missing Values**for the*Don't Know*category to**Exclude from analyses,**which produces the table below: - Restructure the variable to be
**Numeric**by selecting Q6 in the**Data Sets**tree, and from the**object inspector,**go to**Properties > GENERAL > Structure > Numeric,**which produces a table that looks like this: - To create the PLS model, select
**Insert > Analysis > R Output**and paste the following snippet into the**R CODE**in the**object inspector**under**Properties > R CODE.**dat = data.frame(Q6_, Q5_0_, Q5_1_, Q5_2_, Q5_3_, Q5_4_, Q5_5_, Q5_6_, Q5_7_, Q5_8_,

Q5_9_, Q5_10_, Q5_11_, Q5_12_, Q5_13_, Q5_14_, Q5_15_, Q5_16_, Q5_17_,

Q5_18_, Q5_19_, Q5_20_, Q5_21_, Q5_22_, Q5_23_, Q5_24_, Q5_25_, Q5_26_,

Q5_27_, Q5_29_, Q5_28_, Q5_30_, Q5_31_, Q5_32_, Q5_33_)

library(pls)

library(flipFormat)

library(flipTransformations)

dat = AsNumeric(ProcessQVariables(dat), binary = FALSE, remove.first = FALSE)

pls.model = plsr(Q6_ ~ ., data = dat, validation = "CV")

The first line selects Q6_ as the outcome variable (strength of preference for a brand) and then adds 34 predictor variables, each indicating whether the respondent perceives the brand to have a particular characteristic. In your project, these variables can be dragged across from the**Data Sets**tree on the left into the**R CODE**window rather than typing them in one by one.

Next, the 3 libraries containing useful functions are loaded. The package*pls*contains the function to estimate the PLS model, and Displayr's publicly available packages,*flipFormat*, and*flipTransformations*are included to help transform and tidy the data. Since the R*pls*package requires inputs to be numerical, I converted the variables from categorical.

In the final line above the*plsr*function does the work and creates*pls.model*. - Adding the following lines recreates the model with the optimal number of dimensions:

# Find the number of dimensions with lowest cross validation error

cv = RMSEP(pls.model)

best.dims = which.min(cv$val[estimate = "adjCV", , ]) - 1

# Rerun the model

pls.model = plsr(pref ~ ., data = dat, ncomp = best.dims)You will need to replace

*pref*on the last line of code with your outcome variable. In this example, I used*Q6_*as the outcome variable. - Finally, extract the useful information and format the output by adding the following lines of code:

coefficients = coef(pls.model)

The regression coefficients are normalized so their absolute sum is 100. The labels are added and the result is sorted.

sum.coef = sum(sapply(coefficients, abs))

coefficients = coefficients * 100 / sum.coef

names(coefficients) = TidyLabels(Labels(dat)[-1])

coefficients = sort(coefficients, decreasing = TRUE)

The results below show Reliable and Fun are positive predictors of preference, Unconventional and Sleepy are negative predictors, and Tough has little relevance.

## See Also

How To Stack Data for Driver Analysis

How to Run and Interpret Shapley Regression

## Comments

0 comments

Article is closed for comments.