Linear Discriminant Analysis is a machine learning technique that can be used to predict categories.
Requirements
- An outcome variable (the variable that you want to predict)
- Variables that you want to use as predictors
Please note these steps require a Displayr license.
Method
For this example, we'll be using a data set that describes different types of glass based upon physical attributes and chemical composition. The outcome variable is categorical (7 types of glass) and the predictor variables are numeric (the physical attributes).
Like other supervised machine learning algorithms, LDA is first trained on a labeled data set. This in turn enables it to predict categories on a new data set. We'll randomly split the data into a larger 70% training sample and a smaller 30% testing sample. The training sample is used to build the model, and then we can independently verify the accuracy using the unseen training sample.
1. From the toolbar, go to Anything > Filter > Model Checking > Filters for Train-Test Split. By default, this splits the data into a 70% training set and a 30% testing set.
You can see this in a summary table if you drag the newly created question, "Train Test Split", from the Data Sources tree onto the page.
2. Next, we create the LDA model by selecting Anything > Advanced Analysis > Machine Learning > Linear Discriminant Analysis.
3. Click on the model and then select your inputs from the object inspector.
- For Outcome, select the predictor variable from the drop-down list. For this example, I selected the Type variable.
- For Predictor(s), choose the variable(s) that you want to use to predict the outcome variable. I selected Refractive Index and the 8 elements Na, Mg, Al, Si, K, Ca, Ba, and Fe.
- From the Filter(s) drop-down box, select Training sample which was created in Step 1 above.
- Leave the other settings as their defaults.
4. Click the Calculate button to generate the LDA model.
The output from the example is shown below:
5. (Optional) In the object inspector, click Diagnostics > Prediction-Accuracy Table to generate the following output:
The model is predicting Type, which is an integer from 1 to 7, and is correct for 57.8% of the cases. Note that there is no data from Type 4.
Next
How to Validate a Linear Discriminant Analysis Model
How to Save Discriminant Variables From an LDA Output
How to Export LDA Functions From Displayr into Excel
How to Run Machine Learning Diagnostics - Prediction-Accuracy Table