This article describes how to use common mathematical functions in Displayr via R code. However, there are also user-friendly versions of several common calculations within the **Calculation ** menu in the **toolbar**. These are automatically set up to handle different data structures and ignore missing data. See How to Perform Mathematical Calculations on Tables and How to Perform Mathematical Calculations on Variables for more detail. You can use these user-friendly calculations through the **Calculation** menu or by typing them into the **R Code **directly as described below.

## Requirements

Note that some of the methods below require a **Displayr** license, while others can be performed using the **Data Stories** module.

A key concept to keep in mind with all this is R's built-in ability to use vectorization and recycling, this is discussed further with examples in R's Vectorized Math and Custom Variable Creation. Mathematical formulas can be used within an R variable, calculation, or data set. Variable sets, tables, and data sets are all effectively tabular data. The main difference is whether the underlying data is raw data or aggregated data. Raw data will be made up of rows that represent each record and columns that represent each variable. Aggregated data, on the other hand, will be made up of rows and/or columns that represent different intersecting categories.

### Tables

In this example, we will use a table of counts called *table.Brand.Image*:

To make it more straightforward, we have removed all NETs and will use all the remaining table cells in our calculations. If you wish to, for example, ignore the **None of these** row, you can first define the table and then filter it when running the function:

table = table.Brand.Imagefunction_name(table[rownames(table) != "None of these",])

Remember, by default NET (or SUM, if numeric) categories will be added to your table. If you wish to exclude these from your calculation, you will need to either remove them from your table via right-click **> Hide**, or else add this exclusion in your code.

### Variable sets

In this case, we could easily substitute our table with the *Brand Image* variable set itself which is made up of 63 binary variables (9 statements by 7 brand categories):

When dealing with variable sets, you can either reference each variable by **Name,** click on the entire variable set, or reference by **Label** (found under **General > General**):

function_name(cbind(q5a1,q5a2,q5a3,q5a4,q5a5,q5a6,q5a7,q5b1,q5b2,q5b3,q5b4,q5b5,q5b6,q5b7,q5c1,q5c2,q5c3,q5c4,q5c5,q5c6,q5c7,q5d1,q5d2,q5d3,q5d4,q5d5,q5d6,q5d7,q5e1,q5e2,q5e3,q5e4,q5e5,q5e6,q5e7,q5f1,q5f2,q5f3,q5f4,q5f5,q5f6,q5f7,q5g1,q5g2,q5g3,q5g4,q5g5,q5g6,q5g7,q5h1,q5h2,q5h3,q5h4,q5h5,q5h6,q5h7,q5i1,q5i2,q5i3,q5i4,q5i5,q5i6,q5i7))

function_name(`Q5 - Brand Image`)

In this example, using the **Label** method is much more efficient than referencing all 63 variables by name!

By default, NET (or SUM, if numeric) categories will likewise be added to your variable set when referencing by **Label**. However, if you hide these categories in the corresponding table of this variable set, they will also be removed from the variable set. Otherwise, you will need to remove them in your code.

## Method - Mathematical operators and formulas

All the traditional mathematical operators (i.e., +, -, /, (, ), and *) work in R in the way that you would expect when performing math on variables and tables:

q2a_1 / (q2a_1 + q2b_1)

All the standard mathematical functions are also available in R. For example, to create the average of a set of variables, we can use the following:

rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f))

Note, instead of grouping the variables using *cbind*, we can also reference the variable set **Label **(enclosed in backticks):

rowMeans(`Q2 - No. of colas consumed`)

### Vector arithmetic

One of the great strengths of using R is that you can use vector arithmetic. Consider the expression `q2a_1 / sum(q2a_1)`

. This tells R to divide the value of `q2_a1`

by the sum of all the values that all observations take for this variable. That is, when computing the denominator, R sums the values of every observation in the variable. Other programs, such as SPSS, would instead treat this expression as meaning to divide `q2_a1`

by itself. We can similarly standardize `q2a_1`

to have a mean of 0 and a standard deviation of 1 using ```
(q2a_1 - mean(q2a_1))
/ sd(q2a_1)
```

.

In these two examples, there are also pre-constructed specialist functions we can use: ```
q2a_1
/ sum(q2a_1)
```

is equivalent to writing `prop.table(q2a_1)`

, and ```
(q2a_1 - mean(q2a_1))
/ sd(q2a_1)
```

is equivalent to `scale(q2a_1)`

.

Note, most in-built R functions, such as `sd`

, `mean`

, `sum`

, `rowMeans`

, and `rowSums`

, will return missing values if any of the values in the *vector *(variable in this case) passed to them contains a missing value. In most cases, the trick is to use `na.rm = TRUE`

. For example:

(q2a_1 - mean(q2a_1, na.rm = TRUE)) / sd(q2a_1, na.rm = TRUE)

Sadly, there is no shortage of exotic exceptions to this rule. For example, `prop.table`

cannot deal with missing values, and `scale`

automatically removes them.

### Variable sets as tables

If you hover over a variable set R reference (e.g. `Q2 - No. of colas consumed`) in a **Calculation **with your mouse, you can see that it previews the raw data in tabular format.

This example contains 12 variables showing the frequency of consumption for six different colas on two usage occasions. Just like a table, this preview will include NET or SUM rows/columns depending on the data format. This means by default, they will be included in your formula unless you right-click them in the corresponding table and then select** Hide**, or else add this exclusion in your code.

If we were, for example, to look at the sum of the variables pertaining to each occasion: **Sum, 'out and about'** and **Sum, 'at home'**, this would be the last column in the preview: `[,"SUM","SUM"]`

.

These automatically constructed variables can considerably reduce the amount of code required to perform calculations. For example, to compute Coca-Cola's share of category requirements, we can use the expression:

(q2a_1 + q2a_2) / `Q2 - No. of colas consumed`[,"SUM, SUM"]

This is perhaps more obvious when we review the data as an aggregated table to see the interlocked *SUM - SUM* cell:

## Method - Ready-Made Displayr functions

In order to help users along, Displayr created improved versions of common mathematical functions in R that incorporate a great deal of data handling. These are the same functions available in the **Calculation ** dropdown in the toolbar, but can be called by name in your R code directly. You can use them on tables as well as variables. To access, add a line to load the function library:

library(verbs)

Then begin by typing a function name and a search will appear below your code where you can select the appropriate function:

For more information on what the function does, you can click on the read more link that pops up when you hover over the function in the **Calculation** drop-down menu:

For example, the documentation for StandardDeviation states you can pass along one or more tables and one or more variables.

## Method - Base R functions

The following are built-in R functions that will apply to all rows or columns. We have used the *na.rm=T* argument here to ignore missing values. If this argument is not used and the function encounters missing values, it will simply return NA, which is why including this is a good habit. Do note that using the ready-made Displayr functions such as **Sum Each Row** and **Average** as described above automatically ignores missing data - so you don't have to remember to do so!

### 1. Sum each row over the columns:

rowSums(table.Brand.Image, na.rm=T)

### 2. Sum each column over the rows:

colSums(table.Brand.Image, na.rm=T)

### 3. Average each row over the columns:

rowMeans(table.Brand.Image, na.rm=T)

### 4. Average each column over the rows:

colMeans(table.Brand.Image, na.rm=T)

You can also use the functions above to do the same operation over the raw data across variables. For example, you can use the SumRows function across multiple variables to create a filter as in the *Method - Manual *section of How to Create a Combo Box Filter and How to Combine Multiple R Filter Control Variables into a Single Variable.

## Method - *apply()* functions

Some functions can only be run on all rows or columns within an *apply* or similar function. The format is:

apply(data, rows or columns, function)

The *rows or columns* argument requires as **1** for **rows** and **2** for** columns**.

### 1. Minimum of each row over the columns:

apply(table.Brand.Image, 1, min)

### 2. Minimum of each column over the rows:

apply(table.Brand.Image, 2, min)

### 3. Maximum of each row over the columns:

apply(table.Brand.Image, 1, max)

### 4. Maximum of each column over the rows:

apply(table.Brand.Image, 2, max)

Similar to the base functions above, you can use the functions above to do the same operation over the raw data across variables. If you use Binary-Multi variables/variable set as the input for example 3 above, you will create a new NET variable as you are essentially taking the max value (1 or 0) across those variables for each respondent. This is akin to seeing if a respondent selected *Any Of* the variables.

## Method - Custom *apply()* functions

The final parameter of the* apply* function allows for custom functions. The format is:

function(x) {xdo something}

The curly brackets are optional for basic examples but become necessary when you have more elaborate multi-line functions.

### 1. Any rows over 50:

apply(table.Brand.Image, 1, function(x) any(x>50))

### 2. All rows over 20:

apply(table.Brand.Image, 1, function(x) all(x>20))

## Next

How to Work with Conditional R Formulas