Classifying text data into themes for further analysis is a cornerstone of text analysis. This articles explains how to use the Text Categorization module to automatically create themes and then automatically classify responses into those themes. This tool uses Displayr AI which leverages the elegance and power of Large Language Models to accurately and smartly classify text data regardless of its cleanliness or tidiness so there's no need to set up text for analysis beforehand.
This article is broken into the following sections:
- Create Your Classification
- Save Your Classifications
- Use Sort By to Assist with Classification
- Use Filtering to Assist with Classification
- Reuse an Existing Categorization
- Create Additional Themes
In this article, we describe how to go from raw text data:
To a state where the text responses have been classified and can be used for further analysis:
Requirements
- You will need a Text variable. Text variables are represented by an A next to the variable label in the Data Sources tree:
- It is recommended that Displayr AI is enabled for this feature. See Displayr AI for proven examples and instructions on how to accept the terms and enable this feature. If you prefer not to use Displayr AI, our Displayr proprietary algorithm maps language to find similar responses in a "smarter" way than merely searching for exact keywords. This helps you find what you are looking for faster, while at the same time manually controlling how things are being classified.
Create Your Classification
- From the Data Sources tree, select the text variable that you would like to classify.
- Click + > Text Categorization.
- Select whether a text response should be categorized into one or multiple themes:
- Multiple themes - use this when you have open-ended responses or miscellaneous responses
- Only one theme - use this for things like top of mind awareness, spontaneous awareness mentions, or where the response is just one thing
- Click Start.
If you're not sure which option is best for the text data you're working with, see Finding the Best Text Analysis for Your Data for examples.
The Text Categorization module will open. You have the original text responses on the right:
Create Themes and Classify Text Responses
The Create button in Displayr's text categorization tool creates a specified number of themes, and the Classify button classifies responses into those themes.
- Click Create in the upper left.
- Determine the number of New themes you would like to initially create. The default is to create 10 themes. Use the arrows or enter a number if you want to adjust.
- OPTIONAL: Tick Custom prompt and add a prompt to assist AI in creating your initial themes. See Optimizing Text Categorization with Custom Prompts for details.
- Click Create.
- Once the themes are created, you will see them in the Themes pane on the left side of the screen.
- Click Classify on the right to automatically assign the raw text responses to the created Themes.
- OPTIONAL: Tick Custom prompt and add a prompt to assist AI in classifying your text responses. See Optimizing Text Categorization with Custom Prompts for details.
- Click Classify.
When the classification is finished, you will see colored letters next to each text response. This corresponds with the theme it was assigned to.
No automatic classification solution will be perfect, and there are ways that you can tidy up and improve your classifications, so read on.
Resolve Unclassified Data
Once you've created themes and clicked classify, you may have some unclassified data that couldn't be assigned to a specific theme. You will know if any data is left unclassified if there is a non-zero number in the "Unclassified" theme:
You will want to review any unclassified responses and classify them into an existing theme or create a new one. You can create a new theme, such as "All other responses", to capture the responses that don't fit into existing themes.
- Set the Theme filter dropdown in the Responses section to Unclassified to show all unclassified responses.
- Select all of the unclassified responses. You can hold down the shift key while selecting the first and last text responses to select multiple items.
- Add the selected responses to a new theme by:
- Clicking on + Add New Theme at the bottom of the Themes list,
- Dragging a text response to the left and into the purple box at the bottom of the Themes list.
- Hovering over a response, clicking Manually classify, and typing in a new theme label.
- Give the new theme a label, e.g., "All other responses" and hit Enter.
Check Quality of Classified Responses
It's important to give any automatic classification a quality check before reporting on your analysis. A quick way of doing that is to select a theme from the Theme filter and quickly scan through for a quality check.
If there are more than a few items that you need to reclassify into an entirely different theme, drag them all to the "Unclassified" theme. Then click Classify again to re-run the AI theme assignments. This will only classify the unclassified items; it won't reclassify things that are already assigned to a theme.
Below, I've selected the "Brand loyalty" theme and can see that there are some items, such as "drink for the enjoyment and the fun it brings", that are better placed in another theme.
You can manually reclassify responses:
- Select the text response.
- OPTIONAL: If you selected Multiple themes when you started the categorization, you can add responses to an additional theme(s). Drag and drop it into the new theme, or start typing the new theme's label and select it.
- To remove it from the current assigned theme and add it to another, click on the triple letters in the Classification column, remove the assigned theme by clicking on the X to the right of the label, start typing the correct theme, and hit Enter.
Below, "drink for the enjoyment and the fun it brings" is added to "Social perception" and removed from "Brand loyalty".
By default, when you manually classify a text response, its status is set to "locked." See the drink for the enjoyment and fun it brings and no comment responses below. If a response is unclassified or AI has automatically classified it, its status is instead set to "unlocked" like Pay packet below. This is visible on the left of the response when you either hover over or select it.
While all locked responses can be manually classified, only unlocked responses can be automatically classified. This allows you to guarantee that any manual classification won't be re-classified when you run automatic AI classification afterwards. If you do wish to allow this, however, you can manually unlock a response by clicking the adjacent lock icon.
Editing Themes
To rename a theme, right-click and select Rename (F2). You can delete existing themes and reclassify them into an existing theme by right-clicking and selecting Delete and Reclassify as (and then selecting the theme to classify into), or delete a theme altogether by right-clicking and selecting Delete (Del).
Alternatively, you can add more themes manually by clicking + Add New Theme or run the Create function again to create additional themes using AI.
Save Your Classifications
Once you're happy with the classification, click Save in the bottom right corner. This will take you back to Displayr's main Edit mode interface. A new classified variable set will appear in the Data Sources tree next to your original text variable and have "Categorized" in the name.
It will have an icon with two radio buttons (Nominal structure) if you selected Only one theme, or an icon with two boxes (Binary - Multi structure) if you selected Multiple themes when you started the process. You can read more about Displayr's variable set structures here.
Use Similarity Sorting to Assist with Classification
You can use our Similarity algorithms to help you manually classify remaining text data into existing themes. This can be accessed via the Show response similarity to theme or text button:
Similarity to Text
You can enter text in the similarity filter to find responses that are similar in the unclassified data.
- Click the Show response similarity to theme or text button
.
- Enter a keyword or phrase that you want to find similar matches for in the search field.
- Click the Text that appears in the Similarity to text area:
The responses will update to show an orange bar to the left of the text. The length of the bar indicates the match level, so the longer the bar, the better the match.
In the example below, I have some text data that contains responses to what people miss about pre-pandemic life. I used the similarity algorithm to find responses that are similar to "travel":
To assign the results to an existing theme:
- Select the response(s) from the pane on the right that you want to classify.
- Either drag and drop into the appropriate theme, or hover and click Manually classify, start typing the theme's label, and then select the theme from the list.
- OPTIONAL: You can assign to multiple themes by selecting them from the list of themes using Manually classify, or dragging the text into an additional theme.
Similarity to a Theme
You can enter text in the similarity filter to find responses that are similar to themes as well. The algorithm will look at responses that have been classified to a specific theme, and then use those to find similar responses in the unclassified data. The algorithm becomes smarter as more responses are classified to a theme.
- Click the Show response similarity to theme or text button
.
- Enter a keyword or phrase that you want to find similar matches for in the search field.
- Click the theme that appears in the Similarity to theme area:
The responses will update to show an orange bar to the left of the text, and include any responses that have already been classified. The length of the bar indicates the match level, so the longer the bar, the better the match.
To assign the results to an existing theme:
- Select the response(s) from the pane on the right that you want to classify.
- Either drag and drop into the appropriate theme, or hover and click Manually classify, start typing the theme's label, and then select the theme from the list.
- OPTIONAL: You can assign to multiple themes by selecting them from the list of themes using Manually classify, or dragging the text into an additional theme.
Use Filtering to Assist with Classification
There are a number of ways that you can filter your responses within the text categorization interface.
Filter by Locked Responses
To view only locked or unlocked responses, click the Lock icon. By clicking the icon once, only the locked responses will appear. Clicking the lock again will filter to all unlocked responses. If you click the lock a third time, the sorting will be reset entirely.
Filter Responses by Text
You can search for specific text or text strings by clicking on the magnifying glass:
Then you can enter text to search for in the responses:
Filter by Theme
When you have a long theme list, you can search within or filter this list by clicking on the magnifying glass within the Themes section.
You can also filter the responses list by the assigned theme by clicking on the Theme filter in the Responses section:
Filter by Variable
An alternative to filtering by a text search term or theme is to filter by a variable in your document. Any variable that has Usable as a filter ticked under Data > Properties in the object inspector can be selected in the Var filter.
This option will display responses that were made by respondents who fall under the selected filter, but if categorized, all respondents with that response will be categorized, regardless of whether they fall into the filter or not.
You see if some responses come from respondents in a different filter, when you see an "of" count next to a response, such as [1 of 2]. This indicates that only 1 of the 2 respondents who gave this response fall into the filter. If you now classify these responses into a theme, it will classify all instances of this response (2) regardless of the filter. So in this case, classifying "hugging" will apply to both responses and not just the one that falls within the selected filter.
Reuse an Existing Categorization
If you want to use existing themes from an already classified variable on an unclassified text variable and still be able to make use of features available in Displayr's categorization tool, you can reuse existing categorization.
For example, I have a text variable that asks about how social life has changed due to the pandemic that I classified. Next, I want to apply the same themes to another text variable that asked about what parts of social life from pre-pandemic are missed. I can reuse the existing categorization on this other text variable.
- Select the text variable that you want to classify using existing themes.
- Click + > Text Categorization.
- Click the Reuse Existing Categorization tab.
- Select the variable that contains the existing themes that you want to use.
- Click one of the following options:
- Reuse by duplicating - copies an existing code frame and rules as is from a categorized variable set.
- Reuse by sharing - shares an existing code frame and rules from a categorized variable set, so that any additions and changes will be reflected in both.
- The text responses will appear on the right side, and the existing themes appear in the Themes pane to the left. To assign the text responses to an existing theme, click Classify.
- Repeat the steps in the Resolve Unclassified Data and Check the Quality of Classified Responses sections above as needed.
- Click Save when you are finished with classification.
You can also reuse an existing categorization in a completely different document by exporting the code frame from the module and importing it into the other document. See How to Reuse an Existing Categorization (Code Frame) in a Different Document for more details.
Create Additional Themes
There may be instances when you have existing themes but want the create additional themes based on unclassified data. Using the Create function helps with this. It can be useful if you've updated your data with new responses or if you have a lot of leftover unclassified data.
- With the text categorization tool open, adjust the number field next to Create to indicate how many new themes you'd like to add.
- New themes will be added to the bottom of the existing list.
- Click Classify to assign unclassified data to any of the existing themes.
- Repeat the steps in the Resolve Unclassified Data and Check the Quality of Classified Responses sections above as needed.
- Click Save when you are finished with classification.
Alternatively, you can manually add new themes by:
- Clicking on + Add New Theme at the bottom of the Themes list.
- Dragging a text response to the left and into the purple box at the bottom of the Themes list.
- Hovering over a response, clicking Manually classify, and typing in a new theme label.
Next
Optimizing Text Categorization with Custom Prompts
How to Refine and Edit Text Themes After Classification
Frequently Asked Questions about Text Analysis
How to Reuse a Categorization (Code Frame) on a Different Variable