This article describes how to go from a table of text:
To a state where a term document matrix represents the words in the text as a table (or matrix) of numbers:
A verbatim text variable that contains sentences or phrases. Text variables are represented by a small a next to the variable in the Data Sets tree:
- Create a table by dragging a text variable onto a Page.
- From the toolbar, go to Anything > Advanced Analysis > Text Analysis > Advanced > Setup Text Analysis.
- From the object inspector, select the text variable in the Inputs > Text Analysis Options > Text variable drop-down or drag the text variable from your Data Sets tree into the Text variable field.
- Make any modifications to the options in your text analysis setup as described in How to Set Up Your Text Analysis.
- Go to Anything > Advanced Analysis > Text Analysis > Advanced > Term Document Matrix.
- From the object inspector > Inputs > Setup item, select your text analysis output that was created in Steps 3-4.
- OPTIONAL: Update Minimum document count as needed based on your text analysis.
- Go to Calculation > Custom code.
- Paste the following code into the R CODE box:
non.sparse.matrix <- as.matrix(term.document.matrix)
In the code above, replace term.document.matrix with the name of the output that was created in step 6. You can find this by selecting the object and going to Properties > GENERAL > Name.