How to Convert Simple Text Variables into Categories

When working with files without metadata, many times categorical data may be imported as a Text variable instead of a Nominal (categorical) variable. Displayr has the ability to automatically code simple text variables and keep responses alphabetically ordered using automatic logic behind the scenes. "Simple" here means the label can be converted to a category directly with little manipulation. This is very useful for CSV files, where categorical data is often encoded as the labels rather than as numeric values (e.g., a question such as “What is your favorite animal?” would have data values of “Ants”, “Dogs”, “Cats”), but can also occur with poorly formatted data exports.

Notably, this feature is distinct from Displayr's text categorization tool and automatic text categorization outputs. See Finding the Best Text Analysis for your Data for an overview of the other methods for coding text data.

Requirements

A variable set with a Structure of Text or Text-Multi where the text data is simple (either exactly or closely similar to the category to be assigned).

Method

How to convert a single simple text variable to a nominal variable

Select the variable in the Data Sources tree.
In Properties , change the Data > Attributes > Structure from Text to Nominal.

How to convert multiple single text variables (where the item list is consistent)

Select the text variables that should have a shared code frame in the Data Sources tree.
Right-click and select Combine.
The text variables have now been combined into a single Text-Multi variable set.
In Properties , change the Structure for the new variable set under Data > Attributes from Text – Multi to Nominal – Multi.

Technical Details

How converting works

Key points

Converting a Text variable to a Nominal variable via Structure will automatically code the text into categories.
When working with categorical data in Displayr, the data is stored as numbers, and each category is assigned a number. This is known as the code frame and is found in the Properties > Data > Attributes > Values & Labels of a variable (set).
When automatically coding multiple text variables that are related, first use Combine to combine them into a Text – Multi variable set, and then change the Structure to Nominal - Multi (or any other categorical type). This ensures responses from all variables are auto-coded at once and alphabetically ordered.

The auto-coding rules behind the scenes

Leading/trailing spaces and capitalization are ignored - For example, “ dogs” and “Dogs “ will both be coded as the same category.
The label that occurs most often will become the category label - For example, if the responses were “coke“, “COKE”, “Coke”, and “Coke”, the auto-coded question would use “Coke” as the label for the category for those responses, as it occurs twice.
Categories are automatically alphabetized by default when created - This is both in the Value Attributes dialog and on tables.

How Displayr deals with changes in the data file

Whenever the source text variables are updated (from either an updated data file or due to an edit within Displayr), the code frame is automatically re-coded.
Whenever converted variables are combined into a multi-variable set, their code frames change to include unique responses from all other input text variables.
Whenever converted variables are moved from a multi-variable set to their own single-variable, their code frames stop including responses from the other text variables and only include their own responses. Importantly, their category values stay the same.
Existing text responses always keep their same category value (e.g., if “Ants” was originally the first alphabetical response with an auto-coded value of 1, and “Aardvarks” appeared in the new data, “Ants” would remain with a value of 1, and “Aardvarks” would get a new unique value).
The category labels may change if another type of text response becomes the highest occurring response. (e.g. if the new responses were “coke”, “COKE”, “Coke”, “Coke”, “coke” and “coke”, the new label would be “coke”).

Frequently Asked Questions about Text Analysis

Finding the Best Text Analysis for Your Data

How to Combine Variables into a Variable Set

Variable Sets

Articles in this section

Requirements

Method

How to convert a single simple text variable to a nominal variable

How to convert multiple single text variables (where the item list is consistent)

Technical Details

How converting works

Key points

The auto-coding rules behind the scenes

How Displayr deals with changes in the data file

Next

Articles in this section

Requirements

Method

How to convert a single simple text variable to a nominal variable

How to convert multiple single text variables (where the item list is consistent)

Technical Details

How converting works

Key points

The auto-coding rules behind the scenes

How Displayr deals with changes in the data file

Next

Related articles