This article describes how to create a Random Forest output as shown below.
The table below shows the variable importance as computed by a Random Forest. The column called MeanDecreaseAccuracy contains a measure of the extent to which a variable improves the accuracy of the forest in predicting the classification.
- Familiarity with the Structure and Value Attributes of Variable Sets.
- A numeric or categorical variable to be used as an Outcome variable to be predicted. When using a numeric variable a forest of regression trees is estimated; when using a categorical variable a forest of classification trees is estimated.
- Predictors variables will be considered as predictors of the outcome variable.
- In the Anything menu select Advanced Analysis > Machine Learning > Random Forest.
- In the object inspector go to the Inputs tab.
- In the Output menu select the variable to be predicted by the predictor variables.
- Select the predictor variable(s) from the Predictor(s) list.
- OPTIONAL: Select the desired Output type:
- Importance: Produces importance tables, as illustrated above.
- Detail: This returns the default output from randomForest in the randomForest package. It includes a confusion matrix for classification trees, and the percentage of variance explained for regression trees.
- Prediction-Accuracy Table: Produces a table relating the observed and predicted outcome. Also known as a confusion matrix.
- OPTIONAL: Select the desired Missing Data treatment. (See Missing Data Options).
- OPTIONAL: Select Sort by Importance to sort the rows by importance (the last column in the table).