How to Create and Apply a Linear Discriminant Analysis (LDA) Typing Tool in Displayr – Displayr Help

This article shows you how to take discriminant functions produced in one data set and program them to predict segments in a different data set that contains the same variables, or re-use an LDA typing tool that has been given to you as an Excel spreadsheet.

In this example, I’ve run a Latent Class segmentation from a sample of 725 cell phone users. I’ve used the top 2 box scores from a 25-question attitudinal battery as input to the Latent Class Analysis and settled on a 4-class segmentation solution.

Requirements

A categorical variable containing the segments you want to predict. If you do not already have one, you can create an analysis like Cluster Analysis or a Latent Class Analysis - any segmentation method will do.

Method

Next, I ran a Linear Discriminant Analysis to identify the “golden questions”. LDA is perhaps one of the simpler techniques to use for this purpose, as it applies a formula that is easy to understand and program.

From either the toolbar > Anything or the '+' menu in the Report tree, select Advanced Analysis > Machine Learning > Linear Discriminant Analysis.
In the Data > Outcome box of the object inspector , select the variable that identifies your segments.
In the Predictor(s) box, select your predictor variables
Click Calculate.

In this model, I’ve identified 8 of the 25 attitudinal questions, which gives me an 82% segment prediction accuracy.

If you’ve run the Linear Discriminant Analysis in Displayr as I’ve done above, you can then generate the Discriminant Functions by doing the following:

Select the LDA output.
From either the toolbar > Anything or the '+' menu in the Report tree, select Advanced Analysis > Machine Learning > Diagnostic > Table of Discriminant Function Coefficients.

This generates the following output:
LDA Table 2.png

These can be easily exported to Excel if needed by going to Share > Export Report > Excel.

A Word about the Classification Algorithm

With the discriminant functions in hand, we can now create the LDA typing tool, but first, a word about how to formulate the classification algorithm for each segment.

Each segment takes the form of:

segmentn = b + (var1 * coeff1) + (var2 * coeff2) + (var3 * coeff3) + . . . . + (varn * coeffn)

where b is the intercept, varn represents the variable response, and coeffn represents the coefficients from the discriminant function.

For example, a respondent who gave a top 2 box score for the first 4 questions but not for the second four questions would result in the following segment1 value:

segment1 = -3.5 + (1 * 1.8) + (1 * 3.6) + (1 * 1.2) + (1 * 2.4) + (0 * 0.8) + (0 * 0.4) + (0 * 1.6) + (0 * 1.2) = 5.48

If we do this for each of the 4 segments, we end up with a value for each segment. We then determine which value is the largest and assign the respondent to the corresponding segment. In this example, we find that segment 4 has the highest value, so this respondent is allocated to segment 4.

segment1 = 5.48
segment2 = 2.31
segment3 = -4.03
segment4 = 5.73

Applying the algorithm to another survey

To be able to apply the classification algorithm in another survey, the exact same questions must be present in the new survey. We can then use a little bit of JavaScript to formulate the classification algorithm.

To create the JavaScript variable,

In the Data Sources tree, hover and click + > Custom Code > Javascript > Numeric.

For my example, I add the following JavaScript code into the code editor:
```
var segment1 = -3.5 + (q23b * 1.8) + (q23c * 3.6) + (q23h * 1.2) + (q23l * 2.4) + (q23o * 0.8) + (q23u * 0.4) + (q23v * 1.6) + (q23w * 1.2);
var segment2 = -6.3 + (q23b * 0.9) + (q23c * 1.3) + (q23h * 1.9) + (q23l * 4.5) + (q23o * 5.7) + (q23u * 1) + (q23v * 0.1) + (q23w * 0.1);
var segment3 = -18.7 + (q23b * 4.1) + (q23c * 3.3) + (q23h * 2.6) + (q23l * 4.6) + (q23o * 1.5) + (q23u * 5.4) + (q23v * 9.1) + (q23w * 12.8);
var segment4 = -12.1 + (q23b * 4.1) + (q23c * 4.2) + (q23h * 4.1) + (q23l * 5.5) + (q23o * 5.6) + (q23u * 1.5) + (q23v * 2.2) + (q23w * 2.2);

maxSegment = Math.max(segment1, segment2, segment3, segment4);

if (maxSegment == segment1) 1;
else if (maxSegment == segment2) 2;
else if (maxSegment == segment3) 3;
else if (maxSegment == segment4) 4;
else NaN;
```
The first 4 lines of the JavaScript code above calculate the segment variable for each of the segments using the discriminant functions and the responses to each of the questions. The next line identifies the largest segment value and stores it in a variable called maxSegment. The last 4 lines of code check to see which segment value the maxSegment matches and return the corresponding segment.

In the Expression, the coefficients from each of the discriminant functions are multiplied by the values of the corresponding variables for each respondent. It is important to consider the values of the variables that are used in the discriminant formulas.

For Numeric and Ordinal Variable Sets, the value for these variables is multiplied by the coefficient.

For Nominal variable inputs, the model first dichotomizes the variable so that each category is a binary variable, where 0 represents not selected and 1 represents selected. This means you will first need to create a Binary - Multi variable set and use this variable set when referencing the variables in the discriminant formulas.
Click Calculate in the object inspector . This will create a variable that identifies the maxSegment for each case. The segmentation variable based on the classification algorithm is added to your data set.
It’s a good idea to change the Name and Label in the Object Inspector . You should also change the structure to Nominal: Mutually exclusive categories and label the segments so that any tables will show the segments as categories.

For example:
LDA Table 1.png

How to Run Linear Discriminant Analysis

How to Do Latent Class Analysis

Articles in this section

Requirements

Method

A Word about the Classification Algorithm

Applying the algorithm to another survey

Related articles