This article describes how to use common mathematical functions in Displayr via R code. However, there are also user-friendly versions of several common calculations within the Calculation menu in the toolbar. These are automatically set up to handle different data structures and ignore missing data. See How to Perform Mathematical Calculations on Tables and How to Perform Mathematical Calculations on Variables for more detail. You can use these user-friendly calculations through the Calculation menu or by typing them into the R Code directly as described below.
Requirements
Note that some of the methods below require a Displayr license, while others can be performed using the Data Stories module.
A key concept to keep in mind with all this is R's built-in ability to use vectorization and recycling, this is discussed further with examples in R's Vectorized Math and Custom Variable Creation. Mathematical formulas can be used within an R variable, calculation, or data set. Variable sets, tables, and data sets are all effectively tabular data. The main difference is whether the underlying data is raw data or aggregated data. Raw data will be made up of rows that represent each record and columns that represent each variable. Aggregated data, on the other hand, will be made up of rows and/or columns that represent different intersecting categories.
Tables
In this example, we will use a table of counts called table.Brand.Image:
To make it more straightforward, we have removed all NETs and will use all the remaining table cells in our calculations. If you wish to, for example, ignore the None of these row, you can first define the table and then filter it when running the function:
table = table.Brand.Image
function_name(table[rownames(table) != "None of these",])
Remember, by default NET (or SUM, if numeric) categories will be added to your table. If you wish to exclude these from your calculation, you will need to either remove them from your table via right-click > Hide, or else add this exclusion in your code.
Variable sets
In this case, we could easily substitute our table with the Brand Image variable set itself which is made up of 63 binary variables (9 statements by 7 brand categories):
When dealing with variable sets, you can either reference each variable by Name, click on the entire variable set, or reference by Label (found under General > General):
function_name(cbind(q5a1,q5a2,q5a3,q5a4,q5a5,q5a6,q5a7,q5b1,q5b2,q5b3,q5b4,q5b5,q5b6,q5b7,q5c1,q5c2,q5c3,q5c4,q5c5,q5c6,q5c7,q5d1,q5d2,q5d3,q5d4,q5d5,q5d6,q5d7,q5e1,q5e2,q5e3,q5e4,q5e5,q5e6,q5e7,q5f1,q5f2,q5f3,q5f4,q5f5,q5f6,q5f7,q5g1,q5g2,q5g3,q5g4,q5g5,q5g6,q5g7,q5h1,q5h2,q5h3,q5h4,q5h5,q5h6,q5h7,q5i1,q5i2,q5i3,q5i4,q5i5,q5i6,q5i7))
function_name(`Q5 - Brand Image`)
In this example, using the Label method is much more efficient than referencing all 63 variables by name!
By default, NET (or SUM, if numeric) categories will likewise be added to your variable set when referencing by Label. However, if you hide these categories in the corresponding table of this variable set, they will also be removed from the variable set. Otherwise, you will need to remove them in your code.
Method - Mathematical operators and formulas
All the traditional mathematical operators (i.e., +, -, /, (, ), and *) work in R in the way that you would expect when performing math on variables and tables:
q2a_1 / (q2a_1 + q2b_1)
All the standard mathematical functions are also available in R. For example, to create the average of a set of variables, we can use the following:
rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f))
Note, instead of grouping the variables using cbind, we can also reference the variable set Label (enclosed in backticks):
rowMeans(`Q2 - No. of colas consumed`)
Vector arithmetic
One of the great strengths of using R is that you can use vector arithmetic. Consider the expression q2a_1 / sum(q2a_1)
. This tells R to divide the value of q2_a1
by the sum of all the values that all observations take for this variable. That is, when computing the denominator, R sums the values of every observation in the variable. Other programs, such as SPSS, would instead treat this expression as meaning to divide q2_a1
by itself. We can similarly standardize q2a_1
to have a mean of 0 and a standard deviation of 1 using (q2a_1 - mean(q2a_1))
/ sd(q2a_1)
.
In these two examples, there are also pre-constructed specialist functions we can use: q2a_1
/ sum(q2a_1)
is equivalent to writing prop.table(q2a_1)
, and (q2a_1 - mean(q2a_1))
/ sd(q2a_1)
is equivalent to scale(q2a_1)
.
Note, most in-built R functions, such as sd
, mean
, sum
, rowMeans
, and rowSums
, will return missing values if any of the values in the vector (variable in this case) passed to them contains a missing value. In most cases, the trick is to use na.rm = TRUE
. For example:
(q2a_1 - mean(q2a_1, na.rm = TRUE)) / sd(q2a_1, na.rm = TRUE)
Sadly, there is no shortage of exotic exceptions to this rule. For example, prop.table
cannot deal with missing values, and scale
automatically removes them.
Variable sets as tables
If you hover over a variable set R reference (e.g. `Q2 - No. of colas consumed`) in a Calculation with your mouse, you can see that it previews the raw data in tabular format.
This example contains 12 variables showing the frequency of consumption for six different colas on two usage occasions. Just like a table, this preview will include NET or SUM rows/columns depending on the data format. This means by default, they will be included in your formula unless you right-click them in the corresponding table and then select Hide, or else add this exclusion in your code.
If we were, for example, to look at the sum of the variables pertaining to each occasion: Sum, 'out and about' and Sum, 'at home', this would be the last column in the preview: [,"SUM","SUM"]
.
These automatically constructed variables can considerably reduce the amount of code required to perform calculations. For example, to compute Coca-Cola's share of category requirements, we can use the expression:
(q2a_1 + q2a_2) / `Q2 - No. of colas consumed`[,"SUM, SUM"]
This is perhaps more obvious when we review the data as an aggregated table to see the interlocked SUM - SUM cell:
Method - Ready-Made Displayr functions
In order to help users along, Displayr created improved versions of common mathematical functions in R that incorporate a great deal of data handling. These are the same functions available in the Calculation dropdown in the toolbar, but can be called by name in your R code directly. You can use them on tables as well as variables. To access, add a line to load the function library:
library(verbs)
Then begin by typing a function name and a search will appear below your code where you can select the appropriate function:
For more information on what the function does, you can click on the read more link that pops up when you hover over the function in the Calculation drop-down menu:
For example, the documentation for StandardDeviation states you can pass along one or more tables and one or more variables.
Method - Base R functions
The following are built-in R functions that will apply to all rows or columns. We have used the na.rm=T argument here to ignore missing values. If this argument is not used and the function encounters missing values, it will simply return NA, which is why including this is a good habit. Do note that using the ready-made Displayr functions such as Sum Each Row and Average as described above automatically ignores missing data - so you don't have to remember to do so!
1. Sum each row over the columns:
rowSums(table.Brand.Image, na.rm=T)
2. Sum each column over the rows:
colSums(table.Brand.Image, na.rm=T)
3. Average each row over the columns:
rowMeans(table.Brand.Image, na.rm=T)
4. Average each column over the rows:
colMeans(table.Brand.Image, na.rm=T)
You can also use the functions above to do the same operation over the raw data across variables. For example, you can use the SumRows function across multiple variables to create a filter as in the Method - Manual section of How to Create a Combo Box Filter and How to Combine Multiple R Filter Control Variables into a Single Variable.
Method - apply() functions
Some functions can only be run on all rows or columns within an apply or similar function. The format is:
apply(data, rows or columns, function)
The rows or columns argument requires as 1 for rows and 2 for columns.
1. Minimum of each row over the columns:
apply(table.Brand.Image, 1, min)
2. Minimum of each column over the rows:
apply(table.Brand.Image, 2, min)
3. Maximum of each row over the columns:
apply(table.Brand.Image, 1, max)
4. Maximum of each column over the rows:
apply(table.Brand.Image, 2, max)
Similar to the base functions above, you can use the functions above to do the same operation over the raw data across variables. If you use Binary-Multi variables/variable set as the input for example 3 above, you will create a new NET variable as you are essentially taking the max value (1 or 0) across those variables for each respondent. This is akin to seeing if a respondent selected Any Of the variables.
Method - Custom apply() functions
The final parameter of the apply function allows for custom functions. The format is:
function(x) {x do something}
The curly brackets are optional for basic examples but become necessary when you have more elaborate multi-line functions.
1. Any rows over 50:
apply(table.Brand.Image, 1, function(x) any(x>50))
2. All rows over 20:
apply(table.Brand.Image, 1, function(x) all(x>20))
Next
How to Work with Conditional R Formulas