This article describes how to extract categories from text data with a list-like format (i.e., spontaneous/unaided awareness, nouns separated by delimiters). Instead of manually classifying these items, you can run our List of Items function to do it automatically.
It will take you from uncategorized verbatim (raw) text responses:
To a variable that you can use in your analyses:
This article is broken into the following sections:
- Create List of Items output
- Saving categories into a variable
- Combining saved categories variables
- Technical Details
Requirements
- A Displayr document
- A text variable. Text variables are represented by an "A" next to the variable name.
- See Finding the Best Text Analysis for your Data to confirm this is the best solution for you given your data and desired outputs.
Create List of Items output
- Select the text variable that you wish to classify in the Data Sources tree.
- Go to Anything > Advanced Analysis >Text Analysis > Automatic Categorization > List of Items.
- OPTIONAL: Select any other text variables that you'd like to use in the same list of items categorization in the Data Sources > Text variable(s) window.
- OPTIONAL: Update the Minimum category size. Anything that does not meet this threshold will be classified as "Unclassified".
- OPTIONAL: Click Required Categories > Add required phrases or variants if you have text responses that fit into the same category. For example, AT&T and Att:
- OPTIONAL: Enter the required variants in the sheet that opens and click OK.
- OPTIONAL: Specify any delimiters that are present in your raw text data in the Delimiters / Split Text options. For example, semicolons and commas. Edit other options here as needed.
- OPTIONAL: Adjust settings in the Spelling Correction options as required.
- OPTIONAL: Click Add categories to discard in the Categories to Discard section, if needed.
- Click Calculate if Calculate automatically is not ticked.
The results in the output will automatically update based on the changes above.
The example below shows a list of items categorization output for a survey question asking which mobile phone company is used. The Categories section, expanded below, shows a table of the categories on the left and the raw and transformed text on the right. Each category is distinguished by a unique shading, whereas the replaced text is shaded in bright yellow on the right side.
The Diagnostics section at the bottom, which is collapsed by default but can be expanded, shows diagnostic information for each processing step. For example, the Variant suggestions section contains items that you may want to include in the Required Categories section (Step 5 above). See How to Tidy Categories When Automatically Classifying Into an Item List for tips to apply the variant suggestions quickly.
Review the remaining diagnostic sections and determine if any other updates need to be made. Click Calculate once all changes are made to update the list of items categorization.
Saving categories into a variable
Once your list of items categorization is completed, you can save the categories into a variable set to use in further analyses.
- Select the List of Items output on the page.
- From the object inspector, go to Data > Save Variable(s).
- OPTIONAL: Adjust the Maximum number of unique categories to save.
- OPTIONAL: Adjust the Maximum number of categories per case to save.
- Click one of the following options:
- Categories - Save variable(s) to the data set containing the categories. Where there are multiple input variables, multiple sets of variables are added for each.
- First category - Save a variable to the data set containing the first category mentioned. Where there are multiple input categories, the first category of each will be saved as a separate variable.
A new Variable Set for each of the input variables will be saved in your data sources tree. These can then be used like any other variable set to create tables and further outputs. For example, I clicked Categories, which created a new variable and I can use it to create a summary table:
Combining saved categories variables
If you used multiple text variables, such as multiple spontaneous awareness variables, as inputs when creating your list of items categorization:
When you Save Variables from the output, a variable for each input variable is created:
If you'd like to show these variable sets as a single variable set in a table:
Follow these steps:
- Select the first "Categories from..." variable in the Data Sources tree.
- From the object inspector, update the Structure to Nominal - Multi.
- Repeat Steps 1-2 for the remaining "Categories from..." variable(s).
- Select the "Categories from..." variables in the Data Sources tree.
- Hover and click + > Ready Made New Variable(s) > Binary Variable(s).
- Click Yes or No, depending on if you'd like to handle missing data in the variables.
- A new binary - multi variable set will appear just below the original variables containing the combined data.
If you'd instead like to see all of the variable sets broken out by the variable number (i.e., Mention 1, Mention 2, etc.):
You will need to create binary variables for each variable set, and then follow the steps in How to Combine Separate Questions into a Grid in Displayr, paying close attention to the labels and order of the variables.
Technical Details
The variables created by using Save Variable(s) > Categories or First Category will become invalid and need to be deleted and recreated if the output has changed, either due to the input text variable(s) being modified or updated or if the input settings are modified. If the list of final categories has not changed, you can modify the underlying R code for each variable.
Comment out lines 3-7 (for the if() and stop() functions). Otherwise, there can be problems and errors with the structure of the variable.
Next
Finding the Best Text Analysis for Your Data
How to Tidy Categories When Automatically Classifying Into an Item List
How To Automatically Classify Unstructured Text Data
How To Automatically Classify Unstructured Text Data Into an Entity List