This article tells you how to automatically code unstructured text data using existing categorization in your data set. It will take you from unstructured verbatim (raw) text responses:
To a state where a trained machine learning model uses existing categorization to predict the categories for the remaining text:
Requirements
You will need a Text variable in order to perform automatic coding. Text variables are represented by a small a next to the variable in the Data Sets tree:
You will also need an existing categorized variable that was created using semi-automatic categorization or manual categorization to use to predict the categories of the unstructured text. These variables will contain "Categorized" in the name and have a nominal or binary - multi structure:
More information about semi-automatic and manual categorization can be found here:
How to Semi-Automatically Code Text Data
How to Manually Code Mutually Exclusive (Single Response) Text Data
How to Manually Code Multiple Response Text Data
Method
- From the toolbar, go to Anything > Advanced Analysis > Text Analysis > Automatic Categorization > Unstructured Text.
- In the object inspector, under Inputs > DATA SOURCE > Text variable, select the text variable you want to automatically code.
- In the object inspector, under Inputs > CATEGORIES > Existing categorization, select the existing categorized variable.
This method will work on both mutually exclusive and multiple overlapping categorizations.
Next
How to Refine and Edit Text Categories After Categorization
How to Set Up Text for Analysis
How to Semi-Automatically Code Text Data