How to Run Random Forest

The table below shows the variable importance as computed by a Random Forest. The column called MeanDecreaseAccuracy contains a measure of the extent to which a variable improves the accuracy of the forest in predicting the classification.

Requirements

Familiarity with the Structure and Value Attributes of Variable Sets.
A numeric or categorical variable to be used as an Outcome variable to be predicted. When using a numeric variable, a forest of regression trees is estimated; when using a categorical variable, a forest of classification trees is estimated.
Predictor variables will be considered as predictors of the outcome variable.

Method

From the Report tree, hover and click + > Advanced Analysis > Machine Learning > Random Forest.
In Properties , go to the Data tab.
In the Outcome dropdown, select the variable to be predicted by the predictor variables.
Select the predictor variable(s) from the Predictor(s) list.
OPTIONAL: Select the desired Output type:
- Importance: Produces importance tables, as illustrated above.
- Detail: This returns the default output from randomForest in the randomForest package. It includes a confusion matrix for classification trees and the percentage of variance explained for regression trees.
- Prediction-Accuracy Table: Produces a table relating the observed and predicted outcome. Also known as a confusion matrix.
OPTIONAL: Select the desired Missing Data treatment. (See Missing Data Options).
OPTIONAL: Select Sort by Importance to sort the rows by importance (the last column in the table).