This article describes how to perform calculations on subgroups of data using R. A common use case would be if one needed to standardize a rating question based on the respondents' country due to cultural differences in how ratings are interpreted. In the example below we will walk through a hypothetical example of scaling Brand Attitude based on Age.
You will go from a table/variables where the average rating is different per age group:
To scaled variables resulting in an average of 0 for each age group:
Requirements
Please note these steps require a Displayr license.
- A numeric variable or variable sets. You can access the data and code in the following example by downloading the QPack here. See: How to Create a Displayr Document Using a QPack.
Method
The example below scales a brand attitude variable set within each age group in the data and creates a new scaled variable set. Although we are scaling the data in this example, you can perform any custom calculation you like by creating it within a function. You can also use built-in functions like mean(), sum(), etc directly in the dplyr code. To modify your own document, please read through the comments (denoted with #) and edit the code where appropriate.
- From the toolbar, click the Calculation > Custom and draw a box to review your results on the page.
- Paste the following r code in the R Code box in the General tab of the object inspector:
###Identify variables to use in calculation
The code will return a table of the raw data of the scaled variables on the page for your final results that you can review if you wish:
#the numeric variable set to standardize
myvars=`Brand attitude 2`
#select the variable to use for the grouping
groupvar=Age
###Format data to use in calculations
#combine the group and other variables you want to scale in a data.frame
thedata=data.frame(myvars,groupvar,check.names=F)
#remove the SUM column
thedata=thedata[,!colnames(myvars) == "SUM"]
#get list of columns to standardize
thecols=colnames(thedata)
#remove the grouping column from the list
thecols=thecols[thecols != "groupvar"]
###Create a function for your calculation, if needed
#create function to standardize using a formula
standardize <- function(x) (x-mean(x,na.rm=T))/sd(x,na.rm=T)
###Perform the calculations
#load dplyr functions used below
library(dplyr)
#group by the grouping variable and standardize the other variables
newvars <- thedata %>% #create newvars and %>% sends thedata to the function below
group_by(groupvar) %>% #group by the grouping variable in the data (comma separate if multiple)
mutate_at(thecols, standardize) #mutate_at adds a new column for each column in thecols to the results applying the standardize function
###Return the final result
#remove the groupvar from the final result
newvars
- You will now need to create variables for each of the scaled columns in your raw data table. To do that, hover your mouse in the Data Sources tree and select + > Custom Code > R - Numeric.
- In the General tab of the object inspector, give it a Label based on the first column - e.g., Coca-Cola.
- Paste in the code from above in the R Code box, but change the final line of code to the following to only return the data from the first column:
newvars[,1]
- Click Calculate.
- In the Data Sources tree, right-click your "Coca-Cola" variable and click Duplicate.
- Change the Label to your second column - e.g., Diet Coke and edit the final line to use the second column of data
newvars[,2]
. - Repeat steps 7 and 8 for your remaining columns. You should end up with a variable for each column of scaled data:
- In the Data Sources tree, select all of your scaled variables, right-click, and select Combine to combine them into a variable set. This will allow you to show the data as one table.
- Rename the variable set to something sensible like Brand attitude scaled by Age.
- You can now right-click on your Calculation of raw scaled data and select Delete > Delete since it is no longer needed.
If you drag your new variable set onto the page, you'll see that the variables are scaled for each brand overall and within each age group:
Next
How to Scale Variable(s) to Standardize Within Variable
How to Scale Variable(s) to Standardize Within Case
How to Add a Custom Calculation
How to Perform Mathematical Calculations Using R