This article describes how to extract categories from text data with a list-like format (i.e.: nouns separated by delimiters). The algorithm will also attempt to analyse cases with text data that is not in this format, e.g. sentences, and correct misspellings.
Requirements
Multiple Text Variables
Method
To create a List of Items output:
- Select Anything > Advanced Analysis >Text Analysis > Automatic Categorization > List of Items.
- Make any other selections or changes to the settings that you require.
- Ensure the Automatic box is checked, or click Calculate
- Under Inputs > Text variable select one or more Text variables.
- Make any other selections or changes to the settings that you require.
- Ensure the Automatic box is checked, or click Calculate
The example below shows a list categorization output for a survey question on which software respondents use for coding text data. The Categories section which is expanded shows a table of the categories on the left and the raw and transformed text on the right. Each category is distinguished by a unique shading, whereas replaced text is shaded in bright yellow. The Diagnostics section at the bottom (which is collapsed but can be expanded) shows diagnostic information for each processing step (which are also collapsed but can be expanded).
Extracting a Table of Frequencies
To extract the table of frequencies from this output can be done by saving the results as a variable into your Data Set and then making a table from that. Take these steps:
- Select the List of Items output.
- Go to Anything > Advanced Analysis > Text Analysis > Save Variable(s) > Categories
One new Question for each of the input variables will be saved in your Data Set. These can then be used like any other to create tables and further outputs.
Next
How To Automatically Code Unstructured Text Data