How to Hook Up a Term Document Matrix or Sparse Matrix to Custom R Code

This article describes how to use a term document matrix:

or a sparse matrix:

To feed your text analysis into a statistical algorithm, such as a random forest model for further analysis:

Requirements

A Term Document Matrix or a Sparse Matrix.

Method

From the toolbar, select Calculation > Custom Code.
Click on the page to place the custom object.
Enter the R code below in the Code panel:

# Our package containing the Random Forest routine
library(flipMultivariates)
# The package needed to convert the sparse matrix
library(tm)
# Convert the sparse matrix before use
tdm <- as.matrix(term.document.matrix)
# Ensure the column names are appropriate for use in an R model
colnames(tdm) <- make.names(colnames(tdm))
# Combine the outcome variable with the term document matrix
df <- data.frame(statusSource = statusSource, tdm)
# Create the R Formula which describes the relationship we are interrogating
f <- formula(paste0("statusSource ~ ", paste0(colnames(tdm), collapse = "+")))
# Run the random forest model
rf <- RandomForest(f, df)

The code above first converts the term document into a matrix (term.document.matrix), then combines it with the dependent variable (statusSource), and selects an appropriate R formula that relates the dependent variable to the columns of the term document matrix. You will need to replace mentions of term.document.matrix with the name of your term document or sparse matrix. You will also need to replace 'statusSource' with the name of your dependent variable.

Once the Random Forest model runs based on the R code above, you will see an output similar to below, except with your own text data: