This article tells you how to automatically code unstructured text data using existing categorization in your data set. It will take you from unstructured verbatim (raw) text responses:
To a state where a trained machine learning model uses existing categorization to predict the categories for the remaining text:
You will need a Text variable in order to perform automatic coding. Text variables are represented by a small a next to the variable in the Data Sets tree:
You will also need an existing categorized variable that was created using semi-automatic categorization or manual categorization to use to predict the categories of the unstructured text. These variables will contain "Categorized" in the name and have a nominal or binary - multi structure:
More information about semi-automatic and manual categorization can be found here:
- From the toolbar, go to Anything > Advanced Analysis > Text Analysis > Automatic Categorization > Unstructured Text.
- In the object inspector, under Inputs > DATA SOURCE > Text variable, select the text variable you want to automatically code.
- In the object inspector, under Inputs > CATEGORIES > Existing categorization, select the existing categorized variable.
This method will work on both mutually exclusive and multiple overlapping categorizations.