This article tells you how to automatically classify unstructured text data into an entity list. It will take you from unstructured verbatim (raw) text responses:
To an output where responses are automatically grouped into entities (people, places, things):
Once the output is created, you can also save variables containing the variants within each entity, which can be used in further analyses:
Requirements
You will need a Text variable to perform an automatic categorization. Text variables are represented by a capital A next to the variable and can be found in the Data Sources tree:
See Finding the Best Text Analysis for Your Data to determine if this is the best automatic analysis based on your data.
Method
- Select the text variable that you would like to analyze in the Data Sources tree.
- From the toolbar, go to Anything > Advanced Analysis > Text Analysis > Automatic Categorization > Entity Extraction.
- An object will appear on the page and the automatic categorization for the entity extraction will appear.
- OPTIONAL: You can add a new category for entities to identify using the Data > Add named entities to extraction button and adding them to new columns in the first row. Beneath the new entity category, you will need to specify words to include in that entity. For example, the new category Animal would go in the first row, and in the cells beneath you'd list out cat, dog, fish, etc.
- OPTIONAL: You can remove categories from entities using Data > Remove named entities from extraction. Similar to the above, named entities can be excluded from the extraction by populating a similar data entry form. In the first row, state the entity type to remove. In the column below each specified entity type, a list of words can be added in each cell to specify all the words or entities that you wish to remove.
A list of the entities currently supported by default can be found in the technical documentation here.
Saving entities into a variable
Once the entity list is created, you can save them into variable sets to use in further analyses.
- Select the entity extraction object on the page.
- From the object inspector, go to Data > Save Variable(s).
- OPTIONAL: Adjust the Maximum number of unique entity levels to save.
- OPTIONAL: Adjust the Maximum number of entities per case to save.
- Click one of the following options:
- Categories - Save variables to the data set containing the categories from each entity.
- First category - Save variables to the data set containing the first category mentioned from each entity.
A variable will be created for each entity group.
Those variables can be used in tables and other outputs.
Next
How to Save Existing Text Categorization to Use in Other Analyses
How to Automatically Classify Lists of Items
How To Automatically Classify Unstructured Text Data
How to Automatically Classify New Text Data Using an Existing Categorization