This article shows you how to take discriminant functions produced in one data set and program them to predict segments in a different data set that contains the same variables, or re-use an LDA typing tool that has been given to you as an Excel spreadsheet.
A categorical variable containing the segments you want to predict. If you do not already have one, you can create one an analysis like Cluster Analysis or a Latent Class Analysis - any segmentation method will do.
In the example, I’ve run a Latent Class segmentation from a sample of 725 cell phone users. I’ve used the top 2 box scores from a 25-question attitudinal battery as input to the Latent Class Analysis and settled on a 4-class segmentation solution.
Next, I’ve run a Linear Discriminant Analysis to identify the “golden questions”. LDA is perhaps one of the simpler techniques to use for this purpose as it applies a formula that is easy to understand and program.
- Select Anything > Advanced Analysis > Machine Learning > Linear Discriminant Analysis
- In the Outcome box, select the variable that identifies your segments
- In the Predictor(s) box, select your predictor variables
- Click Calculate.
In this model, I’ve identified 8 of the 25 attitudinal questions, which gives me an 82% segment prediction accuracy.
If you’ve run the Linear Discriminant Analysis in Displayr as I’ve done above, you can then generate the Discriminant Functions by doing the following:
- Select the LDA output.
- From the menus select Anything > Advanced Analysis > Machine Learning > Diagnostic > Table of Discriminant Function Coefficients.
This generates the following output:
These can be easily exported to Excel if needed by going to Publish > Export Pages > Excel.
A Word about the Classification Algorithm
With the discriminant functions in hand, we can now create the LDA typing tool, but first a word about how to formulate the classification algorithm for each segment.
Each segment takes the form of:
segmentn = b + (var1 * coeff1) + (var2 * coeff2) + (var3 * coeff3) + . . . . + (varn * coeffn)
where b is the intercept, varn represents the variable response and coeffn represents the coefficients from the discriminant function.
For example, a respondent who gave a top 2 box score for the first 4 questions but not for the second four questions would result in the following segment1 value:
segment1 = -3.5 + (1 * 1.8) + (1 * 3.6) + (1 * 1.2) + (1 * 2.4) + (0 * 0.8) + (0 * 0.4) + (0 * 1.6) + (0 * 1.2) = 5.48
If we do this for each of the 4 segments, we end up with a value for each segment. We then determine which value is the largest and assign the respondent to the corresponding segment. In this example, we find that segment 4 has the highest value, so this respondent is allocated to segment 4.
segment1 = 5.48
segment2 = 2.31
segment3 = -4.03
segment4 = 5.73
Applying the algorithm to another survey
var segment2 = -6.3 + (q23b * 0.9) + (q23c * 1.3) + (q23h * 1.9) + (q23l * 4.5) + (q23o * 5.7) + (q23u * 1) + (q23v * 0.1) + (q23w * 0.1);
var segment3 = -18.7 + (q23b * 4.1) + (q23c * 3.3) + (q23h * 2.6) + (q23l * 4.6) + (q23o * 1.5) + (q23u * 5.4) + (q23v * 9.1) + (q23w * 12.8);
var segment4 = -12.1 + (q23b * 4.1) + (q23c * 4.2) + (q23h * 4.1) + (q23l * 5.5) + (q23o * 5.6) + (q23u * 1.5) + (q23v * 2.2) + (q23w * 2.2);
maxSegment = Math.max(segment1, segment2, segment3, segment4);
if (maxSegment == segment1) 1;
else if (maxSegment == segment2) 2;
else if (maxSegment == segment3) 3;
else if (maxSegment == segment4) 4;
Click Calculate. This will create a variable that identifies the maxSegment for each case. The segmentation variable based on the classification algorithm is added to your data set.
- It’s a good idea to change the Name and Label in the Object Inspector. You should also change the structure to Nominal: Mutually exclusive categories and label the segments so that any tables will show the segments as categories.