This article describes how to perform calculations on subgroups of data using R. A common use case would be if one needed to standardize a rating question based on the respondents' country due to cultural differences in how ratings are interpreted. In the example below we will walk through a hypothetical example of scaling Brand Attitude based on Age.
You will go from a table/variables where the average rating is different per age group:
To scaled variables resulting in an average of 0 for each age group:
- A numeric variable or variable sets. You can access the data and code in the following example by downloading the QPack here. See: How to Import a QPack into a New Document.
The example below scales a brand attitude variable set within each age group in the data and creates a new scaled variable set. Although we are scaling the data in this example, you can perform any custom calculation you like by creating it within a function. You can also use built-in functions like mean(), sum(), etc directly in the dplyr code. To modify for your own document, please read through the comments (denoted with #) and edit the code where appropriate.
- Click the Calculation button and draw a box to review your results on the page.
- Paste in the following r code in the R CODE box on the right:
###Identify variables to use in calculationThe code will return a table of the raw data of the scaled variables on the page for your final results that you can review if you wish:
#the numeric variable set to standardize
myvars=`Brand attitude 2`
#select the variable to use for the grouping
###Format data to use in calculations
#combine the group and other variables you want to scale in a data.frame
#remove the SUM column
thedata=thedata[,!colnames(myvars) == "SUM"]
#get list of columns to standardize
#remove the grouping column from the list
thecols=thecols[thecols != "groupvar"]
###Create a function for your calculation, if needed
#create function to standardize using a formula
standardize <- function(x) (x-mean(x,na.rm=T))/sd(x,na.rm=T)
###Perform the calculations
#load dplyr functions used below
#group by the grouping variable and standardize the other variables
newvars <- thedata %>% #create newvars and %>% sends thedata to the function below
group_by(groupvar) %>% #group by the grouping variable in the data (comma separate if multiple)
mutate_at(thecols, standardize) #mutate_at adds a new column for each column in thecols to the results applying the standardize function
###Return the final result
#remove the groupvar from the final result
- You will now need to create variables for each of the scaled columns in your raw data table. To do that, hover your mouse in the Data Sets tree and select + > Custom Code > R - Numeric.
- In the Object Inspector on the left give it a Label based on the first column - e.g., Coca-Cola.
- Paste in the code from above in the R CODE box, but change the final line of code to the following to only return the data from the first column:
- Click Calculate.
- In the Data Sets tree, select your Coca-Cola variable and click Duplicate.
- Change the Label to your second column - e.g., Diet Coke and edit the final line to use the second column of data
- Repeat steps 7 and 8 for your remaining columns. You should end up with a variable for each column of scaled data:
- In the Data Sets tree, select all of your scaled variables, right-click and select Combine to combine them into a variable set. This will allow you to show the data as one table.
- Rename the variable set something sensible like Brand attitude scaled by Age.
- You can now click on your Calculation of raw scaled data and click Delete since it is no longer needed.
If you drag your new variable set onto the page, you'll see that the variables are scaled for each brand overall and within each age group: