Finding the Best Text Analysis for Your Data

Displayr has an awesome array of cutting-edge text analysis features that can make working with text data so much faster. Text data comes in various shapes and sizes, and you may want to perform different analyses on the same data, such as seeing the percentage of people mentioning topics as well as their sentiment. There are also pros and cons to certain features based on whether you want to use previous results to analyze updated data.

This article discusses the various uses for the text analysis features available in Displayr.

Method 1 - Identifying items in text responses
Method 2 - Classifying text responses
- Types of Categorizations

Requirements

All features accept a variable with a Structure > Text (though different analyses may have additional requirements).
- If wanting to analyze multiple text variables alongside each other, you should Combine them into a Text-Multi Structure first.
If you plan to use any AI-enhanced categorization tools (such as the automated features in our text categorization tool), you must opt-in to Displayr AI. If you don't enable Displayr AI, then Displayr will use its proprietary algorithm to create themes and classify text items.
Examples shown below are in the data file here.

Method 1 - Identifying items in text responses

Sometimes instead of grouping responses into broad themes based on their content, you may want to just pull out the keywords or items (i.e. brands) to analyze. Variables like this are similar to spontaneous awareness questions in surveys, brands purchased, or other items. There are various features and methods that can help with this:

Method 2 - Classifying text responses

Classifying open-ended text responses, also known as coding or categorization in Market Research, has historically been a very long and manual process. Displayr has features with more advanced modeling and AI built in that can streamline this process, drastically reducing the time spent manually classifying responses. There are a few tools to do this:

Feature	Method	Result
Categorization Tool	How to Classify Text Data Our categorization tool is the most robust text analysis tool. You can use a proprietary algorithm or our enhanced Displayr AI modeling to automatically create themes and classify responses. Using the Create and Classify buttons. You can use our Similarity > Similarity to theme search to search for responses similar to responses already assigned to a specific theme. You can easily manually add/remove responses to different themes. The tool uses a .QCodes file on the backend to store the matching responses for each theme. This can be imported and exported to use across different variables and documents. You can also easily classify "Other" text responses from multi-response questions back into the main variable set of options (known as Back Coding Variables). You can create a QCodes file based on a text variable and another variable set of themes, import that into the tool, and finish classifying or revising the themes in the tool. Text is automatically translated.	A variable set with the themes for each response. A .QCodes file (an XML file of the responses and themes) can be exported to reuse in the categorization tool elsewhere. If you plan on updating your text data later, you should use this tool because it retains previously classified responses as-is.
Unstructured Text categorization output	How To Automatically Classified Unstructured Text Data Think of this as more of a quick and dirty categorization tool. This is created via an output on the page. It uses the traditional large language model (LLM) to classify responses (not as intelligent as Displayr AI), and manual categorization and revisions are not supported. However, you can use an existing variable set of themes to train the algorithm to try to classify the remaining text responses into those themes (there's no guarantee that previously classified responses will be classified into the same theme). Note if responses don't fit into a theme they will not be coded at all. QCodes files are also not supported.	An output listing themes, their proportions, and examples of responses. You can save a variable set of the Categories or First Category (if using a Binary-Multi set for the existing categorization) for each response. If the text variable data is updated, the categorization is recreated from scratch, and thus the themes and responses classified into them may change. The variable set that was initially saved will be invalid and must be resaved as well. If you plan on updating your text data later on you should use the categorization tool described above because it keeps previously classified responses as-is.
Sentiment analysis	How to Calculate Sentiment Scores for Open-Ended Responses A simple technique of calculating a sentiment score for each response: negative (below 0), neutral (0), and positive (above 0). These scores can be used in stat testing and can be used to create themes of the sentiment for each response.	A variable "Sentiment scores from ____" is created in the Data Sources tree with the score for each response. This can be crossed by other variables for testing sentiment between groups.
Entity extraction	How To Automatically Classify Unstructured Text Data Into an Entity List The algorithm identifies entities within text responses. Instead of specific items in the list (i.e. New York, Paris, the beach) entities are more generic (i.e. Location). A list of the entities currently supported by default can be found in the technical documentation. However, you can add more entities manually.	An output listing entities, their proportions, and examples of responses. You can save a variable set of the Categories or First Category mentioned for each response. If the text variable is updated, the same algorithm is used to identify items that go into each entity. Thus, previously classified responses will remain and new responses will be automatically classified.

Types of Categorizations

If using the Categorization Tool described above, you will need to specify what type of categorization you want to create beforehand:

Only one theme - each text response can only be assigned one theme. That is, these categories are mutually exclusive.
Multiple themes - each text response can be assigned more than one theme. That is, these are multiple overlapping categories.

You can use the table below to decide how to set up the categorization tool for your data based on the type of table you want to create.

If you have more than one text variable you want to classify using the same themes, but you don't want to show the data in the same table, you should share the categorization code frame. See the sharing section in How to Reuse a Categorization (Code Frame) on a Different Variable.

A Text Analysis Case Study

How to Search for Terms in Text Analysis

Frequently Asked Questions About Text Analysis

Data Format	Method	Result
Delimited strings of items	How to Automatically Classify Lists of Items Note that the feature attempts to correct misspellings and inconsistent formatting. You can specify the delimiter if needed. If you plan to update the data later, you should fill in the Required Variants table with all the themes and their variants to keep results consistent before updating the data. You will also need to edit the saved variables per the article to update those instead of recreating them from scratch.	An interactive output of the results from the categorization. You can also create a variable for the First Category mentioned in the list and a variable set for all Categories mentioned in each observation in the list.
Items saved in separate text variables (max-multi format)	Same method as above - How to Automatically Classify Lists of Items	In this case, when you save Categories and First Category, you will get a Binary-Multi (Compact) variable Set for each of the variables. To combine these into one variable set similar to above, you must change each to Structure > Nominal-Multi, select all the sets, and use the Binary Variables transformation. To see all items from each variable in the same table broken out by 1, 2, and 3, for example, you will need to create Binary Variables for each variable set and follow How to Combine Separate Questions into a Grid in Displayr.
Open-end text responses	How to Set Up Text for Analysis can be used to identify keywords and clean the data (see the article for various settings). Many people use tidied text variables saved from this to create Word Clouds, or, more rarely, use the output to create a Term Document Matrix for advanced custom R code.	You can save a Tidied Text variable (see the right column below): You can save Categories into a new variable set to see the proportions mentioning each keyword:
Various formats including: Delimited strings of items Items saved in separate text variables (max-multi format) Open-end text responses Text-Multi variable sets A table with a single column of text	How to Create a Word Cloud can be used to create a word cloud visualization. There is some attempt at removing connector words by the word cloud, but the Set Up Text for Analysis output is more robust.

Data format	Number of themes for each response	Structure of final variable set and table
Single Text Variable	Only one theme	Makes a Nominal variable
Single Text Variable or Text-Multi Variable Set	Multiple themes	Makes a Binary-Multi variable set
More than 1 Text Variable selected (not in a set)	Only one theme	Makes a Nominal-Multi variable set
More than 1 Text Variable selected (not in a set)	Multiple themes	Makes a Binary-Grid variable set

Articles in this section

Requirements

Method 1 - Identifying items in text responses

Method 2 - Classifying text responses

Types of Categorizations

Next

Articles in this section

Requirements

Method 1 - Identifying items in text responses

Method 2 - Classifying text responses

Types of Categorizations

Next

Related articles