How to Run Linear Discriminant Analysis – Displayr Help

Linear Discriminant Analysis is a machine learning technique that can be used to predict categories.

Requirements

An outcome variable (the variable that you want to predict)
Variables that you want to use as predictors

Method

For this example, we'll use a data set describing different types of glass based on physical attributes and chemical composition. The outcome variable is categorical (7 types of glass), and the predictor variables are numeric (the physical attributes).

Like other supervised machine learning algorithms, LDA is first trained on a labeled data set. This, in turn, enables it to predict categories on a new data set. We'll randomly split the data into a larger 70% training sample and a smaller 30% testing sample. The training sample is used to build the model, and we can then independently verify the model's accuracy on the unseen training sample.

1. From the Report tree select + > Filter > Model Checking > Filters for Train-Test Split. By default, this splits the data into a 70% training set and a 30% testing set.

You can see this in a summary table if you drag the newly created question, "Train Test Split", from the Data Sources tree onto the page.

2. Next, we create the LDA model from the Report tree selecting + > Advanced Analysis > Machine Learning > Linear Discriminant Analysis.

3. Click on the model and then select your inputs from the object inspector .

For Outcome, select the predictor variable from the drop-down list. For this example, I selected the Type variable.
For Predictor(s), choose the variable(s) that you want to use to predict the outcome variable. I selected Refractive Index and the 8 elements Na, Mg, Al, Si, K, Ca, Ba, and Fe.
From the Filter(s) drop-down box, select Training sample, which was created in Step 1 above.
Leave the other settings as their defaults.