This article guides you through automatically classifying unstructured text data. It will take you from unstructured verbatim (raw) text responses:
To a state where the verbatims are automatically categorized:
Requirements
You will need a Text variable to perform automatic coding. Text variables are represented by an A next to the variable in the Data Sources tree:
See Finding the Best Text Analysis for Your Data to determine if this is the best method for your data.
Method
- In the Data Sources tree, select the variable that contains the unstructured text.
- From the toolbar, go to Anything > Advanced Analysis > Text Analysis > Automatic Categorization > Unstructured Text.
- In the object inspector, under Data > Categories > Category Creation select Create New Categorization.
- Under Data > Categories > Number of categories enter a numeric value for the number of categories you would like to end up with. The default is 10.
- Click Calculate if Calculate automatically is not ticked.
If you instead want to use an existing categorization from a different variable in your data set to train the automatic categorization algorithm, see How to Automatically Classify New Text Data Using an Existing Categorization.
Save categories from automatic categorization
To save the categorizations* for use in tables and other outputs:
- Select the automatic categorization output on your page.
- Click Data > Save Variable(s) > Categories or First Category from the object inspector.
- Categories: Save variables to the data set containing the categories. Where there are multiple input variables, multiple sets of variables are added for each.
- First category: Save a variable to the data set containing the first category mentioned. Where there are multiple input categories, the first category of each will be saved as a separate variable.
- [Optional]: To see the proportions of people for each category you can drag the new variable set to the page to create a summary table.
- [Optional]: To see the raw unstructured text verbatims alongside their categories, you can create a raw data table for the variable sets, see How to Create a Raw Data Table From Variable(s).
*NOTE: The variables created from this using Save Variable(s) > Categories and First Category may become invalid and need to be deleted and recreated if the output has changed, either due to the input text variable being modified or the input settings modified.
Next
How to Automatically Classify New Text Data Using an Existing Categorization
Finding the Best Text Analysis for Your Data
How To Automatically Classify Unstructured Text Data Into an Entity List