How to Aggregate Data Using R

This article describes how to go from a data set...

... to a Calculation that aggregates specific fields according to the column average:

Requirements

A Displayr document with a data set imported.

Method - Using aggregate()

In this example, we'll import a data set directly into our calculation using R, but you can also use raw variables in your code as well using cbind().

1. In the toolbar, go to Calculation > Custom Code.

2. Click on the page to place the output.

3. Enter the code below in the Code panel:

# Import Excel data set
library(flipAPI)
data = DownloadXLSX("https://wiki.q-researchsoftware.com/images/1/1b/Aggregation_data.xlsx", want.row.names = FALSE, want.data.frame = TRUE)

# Aggregate data
agg = aggregate(data,
by = list(data$Role),
FUN = mean)

The first lines of code import the data set. This will differ depending on your data source. If importing an SPSS file, see How to Import SPSS Data Sets via R.

The rest of the code uses R's aggregation function.

The first argument to the function is usually a data.frame.
The by argument is a list of variables to group by. This must be a list, even if there is only one variable.
The FUN argument is the function which is applied to all columns (i.e., variables) in the grouped data. Because we cannot calculate the average of categorical variables such as Name and Shift, they result in empty columns, which I have removed for clarity.
Any function that can be applied to a numeric variable can be used within aggregate. Maximum, minimum, count, standard deviation, and sum are all popular.

4. OPTIONAL: You can write your own function to pass into the aggregate FUN argument. Below is an example where the second largest value of each group is returned, or the largest if the group has only one record. Note also that the groups are formed by Role and by Shift together.

second = function(x) {
if (length(x) == 1)
return(x)
return(sort(x, decreasing = TRUE)[2])}

agg = aggregate(data,
by = list(data$Role, data$Shift),
FUN = second)

Method - Using dplyr::summarize()

The following example uses data.frame() to combine variables in the data set for aggregation.

1. In the toolbar, go to Calculation > Custom Code.

2. Click on the page to place the output.

3. Enter the code below in the Code panel:

### combine the data
thedata = data.frame(Gender, Age,q2a_1,q2b_1, check.names=F)

### Aggregate data
#load packages with functions needed
library(dplyr)
library(verbs)

#aggregate the data
newdata <- thedata %>%    #create a newdata table of the aggregated data
           group_by(Gender, Age) %>%    #group the calculations by Gender + Age
           summarize(`Avg Coke Out` = Mean(q2a_1),
                     `Avg Coke Home` = Mean(q2b_1))

4. Click Calculate.