Variable Sets

A variable set is a group of one or more variables. This article describes how:

Variable sets can make data analysis tricky
Displayr automatically detects and presents variable sets
Tables are created from variable sets
Displayr has 13 variable set structures
Structure is based on measurement scale and set type
A variable set's value attributes also determine how it is analyzed

Variable sets can make data analysis tricky

Consider a survey question like How old are you? This is stored in a data file as a single column of data, commonly referred to as a variable. Below, you can see the age data for 10 people. How old are you? has been stored with a variable name of d1.

However, sometimes data can only be represented by multiple columns of data. For example, the data from the question Which of these brands are you familiar with? Coca-Cola, Diet Coke, Coke Zero, Pepsi, Diet Pepsi, and Pepsi Max? may be stored in a data file containing six columns, as shown below (Q1b_1 is the name of the variable showing data for Coca-Cola, Q1b_2 for Diet Coke, etc.).

A group of variables that need to be analyzed together is known as a variable set.

Displayr automatically detects and presents variable sets

When data is imported into Displayr, Displayr automatically groups variables into variable sets. For example, if analyzing data from a survey, it means that Displayr will attempt to automatically group together variables in such a way that each variable set represents a single question in the survey.

If Displayr AI is enabled, a sensible set name will be created. Otherwise, matching text will become the set name, see here for more details.

As an example, the data set below shows data from a survey asking people about cola consumption. Questions like Age, Income, and Gender are each represented in the data file by a single column of data (variable) as we would expect. However, where the variable sets contain multiple variables, they have automatically been grouped together, and a triangle to the left of the icon shows this has occurred (e.g., Awareness).

By clicking on the triangle, we can expand out the variable set and see the variables (columns) within it:

Tables are created from variable sets

Variable sets are the building blocks of tables. Any summary table is a table that summarizes the data from a single variable set. A crosstab is a table that contains two or more variable sets.

Consequently, the key to creating tables is to create and modify variable sets (see How to Combine and Split Variable Sets).

Displayr has 13 variable set structures

A variable set has a structure, which determines how the variable set is used when creating tables (e.g., whether to show averages or percentages and how statistical tests are performed). When a variable set is selected, its structure is shown in the Structure field in the object inspector under the Data > Properties. Sometimes a definition of the structure appears immediately to the right, as in the case below.

Displayr has 13 different types of variable set structures, shown in the table below. Note that a different icon is used to represent each structure in Data Sources.

Structure is based on measurement scale and set type

The structure of a variable set is made up of its: measurement scale and its set type.

Measurement scale

An individual variable has a measurement scale. This determines how the values are treated when used in tables and other analyses. Displayr recognizes the following measurement scales:

Nominal: Two or more categories that are not in any natural order (e.g., Red, Green, Blue).
Ordinal: Two or more categories with an ordering (e.g., Dislike, Ambivalent, Like).
Numeric: Data where a number is stored, and the number has no associated label (e.g., 1.23, 1, 0).
Text: Typically this is used to store unstructured text data.
Date/Time: Dates stored on a continuous scale that can be grouped into time periods for easy analysis.
Binary: There can only be two values 0s and 1s plus missing data. This could be categories like Yes/No or you can select which values to use in the counts, see How to Set Value Attributes for a Binary-Multi and Binary-Grid.

Set type

Set type refers to whether there is a single variable, or multiple variables, which can appear in multiple structures (see the previous section).

A variable set's value attributes also determine how it is analyzed

In addition to the structure of a variable set, the value attributes govern what values are included and excluded in all tables and analyses that use the variable set. Knowing how to properly set the value attributes is a core part of using Displayr. You can access the Value Attributes of a variable set by clicking on the set in Data Sources and then from the object inspector by clicking Data > Properties > Missing values. Value attributes may be set differently for different structures, but the concepts are the same. The two most common measurement scales to do this for are Nominal and Binary variables.

Nominal variables have an underlying code frame where each category (Label) is assigned a Value, and has a Missing Values setting applied:

The Label column lists the default labels for these categories in tables where you can manipulate them further. For any table statistics that require a value (such as the Average, which you can show below the proportions in a table), the numbers shown in the Value column are used. The Missing Values settings determine how this category is handled in tables and analyses. The options are:

Include in analyses - will be included in tables and analyses.
Exclude from analyses - will be excluded from the table and other analysis calculations.
Include in percentages (but not averages) - the category's proportion is shown on the table, but not included in any Averages or other mathematical statistics. Useful if you want to show a Don't Care or N/A category on a rating scale, but not include it in the average rating.
Hide but include in NET calculations - the category is hidden from the table, but the respondents who selected that category are still included in the NET and base/calculation of other statistics.

Binary variables have more limited Value Attributes since they can only take on 2 values (1/0, Selected/Not Selected, etc.) plus missing data. Using this structure, you will select which categories to include in the Count statistic using the Count this Value checkboxes.

Count this Value	Missing Values	How it is Handled
Checked	Include in analyses	Used for "Selected" categories - included in Count and Sample Size statistics (the numerator and denominator of proportions)
Unchecked	Include in analyses	Used for "Seen but not selected" - included in only Sample Size statistics (the denominator of proportions)
Unchecked	Exclude from analyses	Used for "Option not seen" or "Ignore from calculations" - excluded from all statistics and calculations

For more detail on setting attributes for Binary variable sets, see How to Set Value Attributes for a Binary-Multi and Binary-Grid. The above Value Attributes example creates a Top 2 Box version of a Brand Attitude variable set shown below:

Watch our Understanding Variable Sets video

Manipulating Data

Structure	Shown in Data Sources	What is shown in a Table	Example
Nominal		Category proportions
Ordinal		Ordered category proportions
Numeric		Average
Text		Raw text
Date/Time (stored in a YYYY/MM/DD or similar format)		Proportion in each aggregated date
Binary-Multi (commonly used for multi-select questions and Top 2 boxes)		Proportion selected a particular response(s) for a variable (such as Aware)
Binary-Multi (Compact) (multi-select data in max-multi format where each variable is a selection number)		Proportion selected a response
Nominal-Multi (commonly used to group brands to show in the same table)		Proportion of category selected for each variable
Ordinal-Multi (commonly used for ratings across brands)		Proportion of category selected for each variable
Numeric - Multi (commonly used for numeric answers across brands)		Average of each variable
Binary - Grid (commonly used to group multi-selects across brands)		Proportion selected each pair of attributes
Numeric - Grid (commonly used for numeric answers across brands and another attribute)		Average of each pair of attributes
Ranking (Each variable contains the ranking that a respondent has assigned to an object. It is multiple numeric variables that represent a ranking, where the highest number is most preferred and ties are permitted. Used in legacy MaxDiff analysis.)		Probability % of item being chosen as first (based on coefficient from logit model)
Experiment (Used to represent the various different types of experiments, from randomized experiments (Fully randomized experiments through to Choice Modeling). See Experiments for more information. Used in legacy conjoint analysis.)		Coefficient from Experiment

Articles in this section

Variable sets can make data analysis tricky

Displayr automatically detects and presents variable sets

Tables are created from variable sets

Displayr has 13 variable set structures

Structure is based on measurement scale and set type

Measurement scale

Set type

A variable set's value attributes also determine how it is analyzed

Next

Articles in this section

Variable sets can make data analysis tricky

Displayr automatically detects and presents variable sets

Tables are created from variable sets

Displayr has 13 variable set structures

Structure is based on measurement scale and set type

Measurement scale

Set type

A variable set's value attributes also determine how it is analyzed

Next

Related articles