This article describes how to do a Linear Discriminant Analysis (LDA) in Displayr. Linear Discriminant Analysis is a machine learning technique that can be used to predict categories.
A data set that contains the outcome variable (the variable that you want to predict) and some variables that you want to use as predictors.
For this example, we'll be using a data set that describes different types of glass based upon physical attributes and chemical composition. The outcome variable is categorical (7 types of glass) and the predictor variables are numeric (the physical attributes).
Like other supervised machine learning algorithms, LDA is first trained on a labeled data set. This in turn enables it to predict categories on a new data set. We'll randomly split the data into a larger 70% training sample and a smaller 30% testing sample. The training sample is used to build the model, and then we can independently verify the accuracy using the unseen training sample.
1. Log into Displayr and load a document.
2. Load the data set that contains the variables that you want to use as inputs to the Discriminant analysis.
3. Go to Insert > Filter > Filters for Train-Test Split. By default this splits the data into a 70% training set and a 30% testing set.
You can see this in a summary table if you drag the newly created question, Train Test Split, from the Data Sets tree onto the page.
4. Next, we create the LDA model by selecting Insert > Machine Learning > Linear Discriminant Analysis.
5. Click on the model and then select your inputs from the object inspector on the right-hand side of the screen:
- For Outcome, select the predictor variable from the drop-down list. For this example, I selected the Type variable.
- For Predictor(s), choose the variable(s) that you want to use to predict the outcome variable. I selected Refractive Index and the 8 elements Na, Mg, Al, Si, K, Ca, Ba and Fe.
- From the Filter(s) drop-down box, select Training sample which was created in step 3 above.
- Leave the other settings as their defaults.
6. Click the Calculate button to generate the LDA model.
The output from the example is shown below:
The model is predicting Type, which is an integer from 1 to 7, and is correct for 64% of the cases. Note that there is no data from Type 4.