## Introduction

This article describes how to use a *term document matrix:*

or a *sparse matrix:*

To feed your text analysis into a statistical algorithm, such as a *random forest* model for further analysis:

## Requirements

## Method

- From the
**toolbar**, select**Calculation > Custom Code**. - Enter the below R code in
**Properties > R CODE**:

# Our package containing the Random Forest routine

library(flipMultivariates)

# The package needed to convert the sparse matrix

library(tm)

# Convert the sparse matrix before use

tdm <- as.matrix(term.document.matrix)

# Ensure the column names are appropriate for use in an R model

colnames(tdm) <- make.names(colnames(tdm))

# Combine the outcome variable with the term document matrix

df <- data.frame(statusSource = statusSource, tdm)

# Create the R Formula which describes the relationship we are interrogating

f <- formula(paste0("statusSource ~ ", paste0(colnames(tdm), collapse = "+")))

# Run the random forest model

rf <- RandomForest(f, df)

The code above first converts the term document into a matrix *(term.document.matrix)**,* before combining it with the dependent variable (*statusSource*), and selecting an appropriate R formula that relates the dependent variable to the columns of the term document matrix. You will need to replace mentions of *term.document.matrix* with the name of your term document or sparse matrix. You will also need to replace the mention of *statusSource with the name of your *dependent variable.

Once the Random Forest model runs based on the R code above, you will see an output similar to below, except with your own text data:

## See Also

How to Create a Term Document Matrix

## Comments

0 comments

Article is closed for comments.