How to Perform Mathematical Calculations Using R

This article describes how to use common mathematical functions in Displayr via R code. However, there are also user-friendly versions of several common calculations within the Calculation menu in the toolbar. These are automatically configured to handle different data structures and to ignore missing data. See How to Perform Mathematical Calculations on Tables and How to Perform Mathematical Calculations on Variables for more details. You can use these user-friendly calculations via the Calculation menu or by typing them directly into the R Code, as described below.

Requirements

Some knowledge of working with R code. See How to Use R in Displayr.

A key concept to keep in mind with all this is R's built-in vectorization and recycling. This is discussed further with examples in R's Vectorized Math and Custom Variable Creation. Mathematical formulas can be used within an R variable, calculation, or data set. Variable sets, tables, and data sets are all effectively tabular data. The main difference is whether the underlying data is raw or aggregated. Raw data will consist of rows representing each record and columns representing each variable. Aggregated data, on the other hand, will be made up of rows and/or columns that represent different intersecting categories.

Tables

In this example, we will use a table of counts called table.Brand.Image:

To make it simpler, we have removed all NETs and will use all remaining table cells in our calculations. If you wish to, for example, ignore the None of these row, you can first define the table and then filter it when running the function:

table = table.Brand.Image
function_name(table[rownames(table) != "None of these",])

Remember that, by default, NET (or SUM, if numeric) categories are added to your table. If you wish to exclude these from your calculation, you will need to either remove them from your table via right-click > Hide or add this exclusion in your code.

Variable sets

In this case, we could easily substitute our table with the Brand Image variable set itself, which is made up of 63 binary variables (9 statements by 7 brand categories):

When dealing with variable sets, you can either reference each variable by Name, click on the entire variable set, or reference by Label (found under Object Inspector > General > General):

function_name(cbind(q5a1,q5a2,q5a3,q5a4,q5a5,q5a6,q5a7,q5b1,q5b2,q5b3,q5b4,q5b5,q5b6,q5b7,q5c1,q5c2,q5c3,q5c4,q5c5,q5c6,q5c7,q5d1,q5d2,q5d3,q5d4,q5d5,q5d6,q5d7,q5e1,q5e2,q5e3,q5e4,q5e5,q5e6,q5e7,q5f1,q5f2,q5f3,q5f4,q5f5,q5f6,q5f7,q5g1,q5g2,q5g3,q5g4,q5g5,q5g6,q5g7,q5h1,q5h2,q5h3,q5h4,q5h5,q5h6,q5h7,q5i1,q5i2,q5i3,q5i4,q5i5,q5i6,q5i7))

function_name(`Q5 - Brand Image`)

In this example, using the Label method is much more efficient than referencing all 63 variables by name!

By default, NET (or SUM, if numeric) categories will likewise be added to your variable set when referencing by Label. However, if you hide these categories in the corresponding table of this variable set, they will also be removed from the variable set. Otherwise, you will need to remove them in your code.

Method - Mathematical operators and formulas

All the traditional mathematical operators (i.e., +, -, /, (, ), and *) work in R in the way that you would expect when performing math on variables and tables:

q2a_1 / (q2a_1 + q2b_1)

All the standard mathematical functions are also available in R. For example, to create the average of a set of variables, we can use the following:

rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f))

Note, instead of grouping the variables using cbind, we can also reference the variable set Label (enclosed in backticks):

rowMeans(`Q2 - No. of colas consumed`)

Vector arithmetic

One of the great strengths of using R is that you can use vector arithmetic. Consider the expression q2a_1 / sum(q2a_1). This tells R to divide the value of q2_a1 by the sum of all the values that all observations take for this variable. That is, when computing the denominator, R sums the values of every observation in the variable. Other programs, such as SPSS, would instead treat this expression as meaning to divide q2_a1 by itself. We can similarly standardize q2a_1 to have a mean of 0 and a standard deviation of 1 using (q2a_1 - mean(q2a_1)) / sd(q2a_1).

In these two examples, there are also pre-constructed specialist functions we can use: q2a_1 / sum(q2a_1) is equivalent to writing prop.table(q2a_1), and (q2a_1 - mean(q2a_1)) / sd(q2a_1) is equivalent to scale(q2a_1).

Note, most in-built R functions, such as sd, mean, sum, rowMeans, and rowSums, will return missing values if any of the values in the vector (variable in this case) passed to them contains a missing value. In most cases, the trick is to use na.rm = TRUE. For example:

(q2a_1 - mean(q2a_1, na.rm = TRUE)) / sd(q2a_1, na.rm = TRUE)

Sadly, there is no shortage of exotic exceptions to this rule. For example, prop.table cannot deal with missing values, and scale automatically removes them.

Variable sets as tables

If you hover over a variable set R reference (e.g. 'Q2 - No. of colas consumed') in a Calculation with your mouse, you can see that it previews the raw data in tabular format.

This example contains 12 variables showing the frequency of consumption for six different colas on two usage occasions. Just like a table, this preview will include NET or SUM rows/columns depending on the data format. This means by default, they will be included in your formula unless you right-click them in the corresponding table and then select Hide, or else add this exclusion in your code.

If we were, for example, to look at the sum of the variables pertaining to each occasion: Sum, 'out and about' and Sum, 'at home', this would be the last column in the preview: [,"SUM","SUM"].

These automatically constructed variables can considerably reduce the amount of code required to perform calculations. For example, to compute Coca-Cola's share of category requirements, we can use the expression:

(q2a_1 + q2a_2) / `Q2 - No. of colas consumed`[,"SUM, SUM"]

This is perhaps more obvious when we review the data as an aggregated table to see the interlocked SUM - SUM cell:

Method - Ready-Made Displayr functions

In order to help users along, Displayr created improved versions of common mathematical functions in R that incorporate a great deal of data handling. These are the same functions available from Calculation in the toolbar but can be called by name in your R code directly. You can use them on tables as well as variables. To access, add a line to load the function library:

library(verbs)

Then begin by typing a function name and a search will appear below your code where you can select the appropriate function:

For more information on what the function does, you can click on the read more link that pops up when you hover over the function in the Calculation drop-down menu:

For example, the documentation for StandardDeviation states you can pass along one or more tables and one or more variables.

Method - Base R functions

The following are built-in R functions that will apply to all rows or columns. We have used the na.rm=T argument here to ignore missing values. If this argument is not used and the function encounters missing values, it will simply return NA, which is why including this is a good habit. Do note that using the ready-made Displayr functions such as Sum Each Row and Average as described above automatically ignores missing data - so you don't have to remember to do so!

1. Sum each row over the columns:

rowSums(table.Brand.Image, na.rm=T)

2. Sum each column over the rows:

colSums(table.Brand.Image, na.rm=T)

3. Average each row over the columns:

rowMeans(table.Brand.Image, na.rm=T)

4. Average each column over the rows:

colMeans(table.Brand.Image, na.rm=T)

You can also use the functions above to do the same operation over the raw data across variables. For example, you can use the SumRows function across multiple variables to create a filter as in the Method - Manual section of How to Create a Combo Box Filter and How to Combine Multiple R Filter Control Variables into a Single Variable.

Method - apply() functions

Some functions can only be run on all rows or columns within an apply or similar function. The format is:

apply(data, rows or columns, function)

The rows or columns argument requires as 1 for rows and 2 for columns.

1. Minimum of each row over the columns:

apply(table.Brand.Image, 1, min)

2. Minimum of each column over the rows:

apply(table.Brand.Image, 2, min)

3. Maximum of each row over the columns:

apply(table.Brand.Image, 1, max)

4. Maximum of each column over the rows:

apply(table.Brand.Image, 2, max)

Similar to the base functions above, you can use the functions above to do the same operation over the raw data across variables. If you use Binary-Multi variables/variable set as the input for example 3 above, you will create a new NET variable as you are essentially taking the max value (1 or 0) across those variables for each respondent. This is akin to seeing if a respondent selected Any Of the variables.

Method - Custom apply() functions

The final parameter of the apply function allows for custom functions. The format is:

function(x) {x do something}

The curly brackets are optional for basic examples but become necessary when you have more elaborate multi-line functions.

1. Any rows over 50:

apply(table.Brand.Image, 1, function(x) any(x>50))

2. All rows over 20:

apply(table.Brand.Image, 1, function(x) all(x>20))

How to Use R in Displayr

How to Work with Conditional R Formulas

How to Use Point and Click Inside R Code

Bespoke Analyses and Calculations Video

Articles in this section

Requirements

Tables

Variable sets

Method - Mathematical operators and formulas

Vector arithmetic

Variable sets as tables

Method - Ready-Made Displayr functions

Method - Base R functions

1. Sum each row over the columns:

2. Sum each column over the rows:

3. Average each row over the columns:

4. Average each column over the rows:

Method - apply() functions

1. Minimum of each row over the columns:

2. Minimum of each column over the rows:

3. Maximum of each row over the columns:

4. Maximum of each column over the rows:

Method - Custom apply() functions

1. Any rows over 50:

2. All rows over 20:

Next

Articles in this section

Requirements

Tables

Variable sets

Method - Mathematical operators and formulas

Vector arithmetic

Variable sets as tables

Method - Ready-Made Displayr functions

Method - Base R functions

1. Sum each row over the columns:

2. Sum each column over the rows:

3. Average each row over the columns:

4. Average each column over the rows:

Method - apply() functions

1. Minimum of each row over the columns:

2. Minimum of each column over the rows:

3. Maximum of each row over the columns:

4. Maximum of each column over the rows:

Method - Custom apply() functions

1. Any rows over 50:

2. All rows over 20:

Next

Related articles