Displayr has an awesome array of cutting-edge text analysis features that can make working with text data so much faster. Text data comes in various shapes and sizes, and you may want to perform different analyses on the same data, such as seeing the percentage of people mentioning topics as well as their sentiment. There are also pros and cons to certain features based on whether you want to use previous results to analyze updated data.
This article discusses the various uses for the text analysis features available in Displayr.
Requirements
- All features accept a variable with a Structure > Text (though different analyses may have additional requirements).
- If wanting to analyze multiple text variables alongside each other, you should Combine them into a Text-Multi Structure first.
- If you plan to use any AI-enhanced categorization tools (such as the automated features in our text categorization tool), you must opt-in to Displayr AI. If you don't enable Displayr AI, then Displayr will use its proprietary algorithm to create themes and classify text items.
- Examples shown below are in the data file here.
Method 1 - Identifying items in text responses
Sometimes instead of grouping responses into broad themes based on their content, you may want to just pull out the keywords or items (i.e. brands) to analyze. Variables like this are similar to spontaneous awareness questions in surveys, brands purchased, or other items. There are various features and methods that can help with this:
Data Format | Method | Result |
Delimited strings of items |
How to Automatically Classify Lists of Items Note that the feature attempts to correct misspellings and inconsistent formatting. You can specify the delimiter if needed. If you plan to update the data later, you should fill in the Required Variants table with all the themes and their variants to keep results consistent before updating the data. You will also need to edit the saved variables per the article to update those instead of recreating them from scratch. |
An interactive output of the results from the categorization. You can also create a variable for the First Category mentioned in the list and a variable set for all Categories mentioned in each observation in the list. |
Items saved in separate text variables (max-multi format) |
Same method as above - How to Automatically Classify Lists of Items |
In this case, when you save Categories and First Category, you will get a Binary-Multi (Compact) variable Set for each of the variables. To combine these into one variable set similar to above, you must change each to Structure > Nominal-Multi, select all the sets, and use the Binary Variables transformation. To see all items from each variable in the same table broken out by 1, 2, and 3, for example, you will need to create Binary Variables for each variable set and follow How to Combine Separate Questions into a Grid in Displayr. |
Open-end text responses
|
How to Set Up Text for Analysis can be used to identify keywords and clean the data (see the article for various settings). Many people use tidied text variables saved from this to create Word Clouds, or, more rarely, use the output to create a Term Document Matrix for advanced custom R code. |
You can save a Tidied Text variable (see the right column below): You can save Categories into a new variable set to see the proportions mentioning each keyword: |
Various formats including: Delimited strings of items Items saved in separate text variables (max-multi format) Open-end text responses Text-Multi variable sets A table with a single column of text |
How to Create a Word Cloud can be used to create a word cloud visualization. There is some attempt at removing connector words by the word cloud, but the Set Up Text for Analysis output is more robust. |
Method 2 - Classifying text responses
Classifying open-ended text responses, also known as coding or categorization in Market Research, has historically been a very long and manual process. Displayr has features with more advanced modeling and AI built in that can streamline this process, drastically reducing the time spent manually classifying responses. There are a few tools to do this:
Feature | Method | Result |
Categorization Tool |
Our categorization tool is the most robust text analysis tool.
|
If you plan on updating your text data later, you should use this tool because it retains previously classified responses as-is. |
Unstructured Text categorization output |
How To Automatically Classified Unstructured Text Data Think of this as more of a quick and dirty categorization tool. This is created via an output on the page. It uses the traditional large language model (LLM) to classify responses (not as intelligent as Displayr AI), and manual categorization and revisions are not supported. However, you can use an existing variable set of themes to train the algorithm to try to classify the remaining text responses into those themes (there's no guarantee that previously classified responses will be classified into the same theme). Note if responses don't fit into a theme they will not be coded at all. QCodes files are also not supported. |
If the text variable data is updated, the categorization is recreated from scratch, and thus the themes and responses classified into them may change. The variable set that was initially saved will be invalid and must be resaved as well. If you plan on updating your text data later on you should use the categorization tool described above because it keeps previously classified responses as-is. |
Sentiment analysis |
How to Calculate Sentiment Scores for Open-Ended Responses A simple technique of calculating a sentiment score for each response: negative (below 0), neutral (0), and positive (above 0). These scores can be used in stat testing and can be used to create themes of the sentiment for each response. |
A variable "Sentiment scores from ____" is created in the Data Sources tree with the score for each response. This can be crossed by other variables for testing sentiment between groups. |
Entity extraction |
How To Automatically Classify Unstructured Text Data Into an Entity List The algorithm identifies entities within text responses. Instead of specific items in the list (i.e. New York, Paris, the beach) entities are more generic (i.e. Location). A list of the entities currently supported by default can be found in the technical documentation. However, you can add more entities manually. |
If the text variable is updated, the same algorithm is used to identify items that go into each entity. Thus, previously classified responses will remain and new responses will be automatically classified. |
Types of Categorizations
If using the Categorization Tool described above, you will need to specify what type of categorization you want to create beforehand:
- Only one theme - each text response can only be assigned one theme.
- Multiple themes - each text response can be assigned more than one theme.
You can use the table below to decide how to set up the categorization tool for your data based on the type of table you want to create.
Data format | Number of themes for each response | Structure of final variable set and table |
Single Text Variable | 1 |
Mutually Exclusive makes a Nominal variable |
Single Text Variable or Text-Multi Variable Set |
More than 1 |
Multiple Overlapping makes a Binary-Multi variable set |
More than 1 Text Variable selected (not in a set) | 1 |
Mutually Exclusive makes Nominal-Multi variable set |
More than 1 Text Variable selected (not in a set) | More than 1 |
Multiple Overlapping makes Binary-Grid variable set |
If you have more than one text variable you want to classify using the same themes, but you don't want to show the data in the same table, you should share the categorization code frame. See the sharing section in How to Reuse a Categorization (Code Frame) on a Different Variable.