A variable set is a group of one or more variables. This article describes how:
- Variable sets can make data analysis tricky
- Displayr automatically detects and presents variable sets
- Tables are created from variable sets
- Displayr has 13 variable set structures
- Structure is based on measurement scale and set type
- A variable set's value attributes also determine how it is analyzed
Variable sets can make data analysis tricky
Consider a survey question like How old are you? This is stored in a data file as a single column of data, commonly referred to as a variable. Below, you can see the age data for 10 people. How old are you? has been stored with a variable name of d1.
However, sometimes data can only be represented by multiple columns of data. For example, the data from the question Which of these brands are you familiar with? Coca-Cola, Diet Coke, Coke Zero, Pepsi, Diet Pepsi, and Pepsi Max? may be stored in a data file containing six columns, as shown below (Q1b_1 is the name of the variable showing data for Coca-Cola, Q1b_2 for Diet Coke, etc.).
A group of variables that need to be analyzed together is known as a variable set.
Displayr automatically detects and presents variable sets
When data is imported into Displayr, Displayr automatically groups variables into variable sets. For example, if analyzing data from a survey, it means that Displayr will attempt to automatically group together variables in such a way that each variable set represents a single question in the survey.
As an example, the data set below shows data from a survey asking people about cola consumption. Questions like Age, Income, and Gender are each represented in the data file by a single column of data (variable) as we would expect. However, where the variable sets contain multiple variables, they have automatically been grouped together, and a triangle to the left of the icon shows this has occurred (e.g., Awareness).
By clicking on the triangle, we can expand out the variable set and see the variables (columns) within it:
Tables are created from variable sets
Variable sets are the building block of tables. Any summary table is a table which is summarizing the data from a single variable set. A crosstab is a table that contains two or more variable sets.
Consequently, the key to creating tables is to create and modify variable sets (see How to Combine and Split Variable Sets).
Displayr has 13 variable set structures
A variable set has a structure, which determines how the variable set is used when creating tables (e.g., whether to show averages or percentages and how statistical tests are performed). When a variable set is selected, its structure is shown in the Structure field in the object inspector, on the right side of the screen. Sometimes a definition of the structure appears immediately to the right, as in the case below.
Displayr has 13 different types of variable set structures, shown in the table below. Note that a different icon is used to represent each structure in the data sets tree.
|Structure||Shown in Data Sets tree||What is shown in a Table||Example|
(stored in a YYYY/MM/DD or similar format)
|Proportion in each aggregated date|
(commonly used for multi-select questions and Top 2 boxes)
Proportion selected a particular response(s) for a variable (such as Aware)
(multi-select data in max-multi format where each variable is a selection number)
|Proportion selected a response|
(commonly used to group brands to show in the same table)
|Proportion of category selected for each variable|
(commonly used for ratings across brands)
|Proportion of category selected for each variable|
Numeric - Multi
(commonly used for numeric answers across brands)
|Average of each variable|
Binary - Grid
(commonly used to group multi-selects across brands)
|Proportion selected each pair of attributes|
Numeric - Grid
|Average of each pair of attributes|
|Probability % of item being chosen as first (based on coefficient from logit model)|
|Coefficient from Experiment|
Structure is based on measurement scale and set type
The structure of a variable set is made up of its: measurement scale and its set type.
An individual variable has a measurement scale. This determines how the values are treated when used in tables and other analyses. Displayr recognizes the following measurement scales:
- Nominal: Two or more categories that are not in any natural order (e.g., Red, Green, Blue).
- Ordinal: Two or more categories with an ordering (e.g., Dislike, Ambivalent, Like).
- Numeric: Data where a number is stored, and the number has no associated label (e.g., 1.23, 1, 0).
- Text: Typically this is used to store unstructured text data.
- Date/Time: Dates stored on a continuous scale that can be grouped into time periods for easy analysis.
- Binary: There can only be two values 0s and 1s plus missing data. This could be categories like Yes/No or you can select which values to use in the counts, see How to Set Value Attributes for a Binary-Multi and Binary-Grid.
Set type refers to whether there is a single variable, multiple variables, which can appear in multiple structures (see the previous section).
A variable set's value attributes also determine how it is analyzed
In addition to the structure of a variable set, the value attributes govern what values are included and excluded in all tables and analyses that use the variable set. Knowing how to properly set the value attributes is a core part of using Displayr. You can access the Value Attributes of a variable set by clicking on the set in the Data Sets tree and then from the object inspector clicking DATA VALUES > Missing values. Value attributes may be set differently for different structures, but the concepts are the same. The two most common measurement scales to do this for are Nominal and Binary variables.
Nominal variables have an underlying code frame where each category (Label) is assigned a Value, and has a Missing Values setting applied:
The Label column lists the default labels for these categories in tables where you can manipulate them further. For any table statistics that require a value (such as the Average, which you can show below the proportions in a table), the numbers shown in the Value column are used. The Missing Values settings determine how this category is handled in tables and analyses. The options are:
- Include in analyses - will be included in tables and analyses.
- Exclude from analyses - will be excluded from table and other analysis calculations.
- Include in percentages (but not averages) - the category's proportion is shown on the table, but not included in any Averages or other mathematical statistics. Useful if you want to show a Don't Care or N/A category on a rating scale, but not include it in the average rating.
- Hide but include in NET calculations - the category is hidden from the table, but the respondents who selected that category are still included in the NET and base/calculation of other statistics.
Binary variables have more limited Value Attributes since they can only take on 2 values (1/0, Selected/Not Selected, etc.) plus missing data. Using this structure, you will select which categories to include in the Count statistic using the Count this Value checkboxes.
|Count this Value||Missing Values||How it is Handled|
|Checked||Include in analyses||Used for "Selected" categories - included in Count and Sample Size statistics (the numerator and denominator of proportions)|
|Unchecked||Include in analyses||Used for "Seen but not selected" - included in only Sample Size statistics (the denominator of proportions)|
|Unchecked||Exclude from analyses||
Used for "Option not seen" or "Ignore from calculations" - excluded from all statistics and calculations
For more detail on setting attributes for Binary variable sets, see How to Set Value Attributes for a Binary-Multi and Binary-Grid. The above Value Attributes example creates a Top 2 Box version of a Brand Attitude variable set shown below:
Watch our Understanding Variable Sets video