How to Standardize or Calculate Data Within Subgroups in R

This article describes how to perform calculations on subgroups of data using R. A common use case would be if one needed to standardize a rating question based on the respondents' country due to cultural differences in how ratings are interpreted. In the example below, we will walk through a hypothetical example of scaling Brand Attitude based on Age.

You will go from a table/variables where the average rating is different per age group:

To scaled variables resulting in an average of 0 for each age group:

Requirements

A numeric variable or variable sets. You can access the data and code in the following example by downloading the QPack here. See: How to Create a Displayr Document Using a QPack.

Method

The example below scales a brand attitude variable set within each age group in the data and creates a new scaled variable set. Although we are scaling the data in this example, you can perform any custom calculation you like by creating it within a function. You can also use built-in functions like mean(), sum(), etc directly in the dplyr code. To modify your own document, please read through the comments (denoted with #) and edit the code where appropriate.

From the toolbar, click Calculation > Custom Code and click on the page to place the custom object.

Paste the following R code in the code panel and click Calculate.

###Identify variables to use in calculation
#the numeric variable set to standardize
myvars=`Brand attitude 2`

#select the variable to use for the grouping
groupvar=Age

###Format data to use in calculations
#combine the group and other variables you want to scale in a data.frame
thedata=data.frame(myvars,groupvar,check.names=F)

#remove the SUM column
thedata=thedata[,!colnames(myvars) == "SUM"] 

#get list of columns to standardize
thecols=colnames(thedata)
#remove the grouping column from the list
thecols=thecols[thecols != "groupvar"]


###Create a function for your calculation, if needed
#create function to standardize using a formula
standardize <- function(x) (x-mean(x,na.rm=T))/sd(x,na.rm=T)

###Perform the calculations
#load dplyr functions used below
library(dplyr)

#group by the grouping variable and standardize the other variables
newvars <- thedata %>% #create newvars and %>% sends thedata to the function below
           group_by(groupvar) %>% #group by the grouping variable in the data (comma separate if multiple)
           mutate_at(thecols, standardize) #mutate_at adds a new column for each column in thecols to the results applying the standardize function

###Return the final result
#remove the groupvar from the final result
newvars

The code will return a table of the raw data of the scaled variables on the page for your final results that you can review if you wish:
RTab;lle.png

You will now need to create variables for each of the scaled columns in your raw data table. To do that, hover your mouse in the Data Sources tree and select + > Custom Code > R > Numeric.
In the General tab of Properties , give it a Label based on the first column - e.g., Coca-Cola.
Paste in the code from above in the Code panel, but change the final line of code to the following to only return the data from the first column:
```
newvars[,1]
```
Click Calculate.
In the Data Sources tree, right-click your "Coca-Cola" variable and click Duplicate.
Change the Label to your second column - e.g., Diet Coke, and edit the final line to use the second column of data newvars[,2].
Repeat steps 7 and 8 for your remaining columns. You should end up with a variable for each column of scaled data:
In the Data Sources tree, select all of your scaled variables, right-click, and select Combine to combine them into a variable set. This will allow you to show the data as one table.
Rename the variable set to something sensible, like Brand attitude scaled by Age.
You can now right-click on your Calculation of raw scaled data and select Delete since it is no longer needed.