Introduction
This article describes how to reshape data into a long or wide format using R. This can help when you want to aggregate data in a specific way (using dplyr), stack a single variable (using melt), or de-loop variables into individual variables (using dcast).
Below is data for preferred cola and ratings of each cola with 600 rows/respondents. You can transform your data from its current shape...
...to long shape. Now each row is a combination of preferred cola - cola rated - rating, amounting to 3600 ratings/rows:
You can also reshape the data into summarized data. The table below shows the 80th percentile rating for each cola based on preferred cola, 48 pairs/rows:
And you can reshape the above back to wide data. The below shows the 80th percentile rating for each cola based on preferred cola with colas rated in the columns and preferred cola in the rows:
Requirements
Numeric variables or a variable set.
Optional grouping variable - only needed for custom calculations or making data wide.
To follow along with the example below download the QPack here.
Method - Reshape to Long
In the example below we take a variable for Preferred Cola and use it alongside a numeric Brand attitude variable set. Then we stack the Brand attitude variable set, making it long. Data in a long format like this can be used to create ggplot visualizations and to create other calculations based on grouping variables. In the example below, we take the 80th percentile of each brand rating based on what the respondents' preferred cola is.
- Click the Calculation button and draw an output on the page.
- In the R CODE box, paste in the following code. Note the comments (prefaced with a #) are there to explain what the code does so you can modify it for your own needs:
#identify the variables or variable set to make long
variablestomelt=`Brand attitude Numeric`
#add in any other grouping variables you'd like
grouping=`Preferred cola`
#combine into data.frame for melting (making long)
thedata=data.frame(grouping,variablestomelt,check.names=F)
#remove any sum columns if there is numeric data
thedata=thedata[,colnames(thedata) != "SUM"]
#reshape the data into long format
library(reshape2)
melted=melt(thedata, id.vars="grouping") - Click Calculate. The data will appear in a long format.
You can copy this code to create a new Data Set (see: How to Import a Data Set Using R), and use it as an input into a custom ggplot visualization using R, or as an input into other analyses or outputs in your document. The long format also makes it easy to perform calculations for sub-groups in the data. If you append the following code to your code above you will see the 80th percentile for each preferred cola - cola rated pair.
###Calculate 80th percentile for each preferred - rated pair
#load functions needed
library(dplyr)
#create a new data frame called percentiles from the melted data
percentiles = melted %>%
group_by(grouping, variable) %>% #group the calculations by the grouping/variable pairs
summarize(`80th pctl`=quantile(value, .8, na.rm=T)) #perform the 80th percentile calc on each group
If you google, "melt in R" or "dplyr summarize in R" you will find many examples of how the two main functions above work.
Method - Reshape to Wide
In the example below we take data that is in the long format and make it wide. We are using the data directly above for 80th percentiles and we want to reshape it into a wide format where each row is the preferred cola and each column is the cola rated.
- Click the Calculation button and draw an output on the page.
- In the R CODE box paste in the below code. Note the comments (prefaced with a #) are there to explain what the code does so you can modify it for your own needs.
#load reshape library if not done earlier in code
library(reshape2)
#make the data wide where the rows is the grouping variable and columns are
#categories in the variable column
wide=dcast(percentiles, grouping ~ variable, mean)
#rename the grouping column since it was removed when the data was melted above
colnames(wide)[1]="Preferred Cola"
#return final result
wide
You can also use your data directly in the dcast
function where percentiles are used. The grouping ~ variable bit identifies which column(s) to keep as single columns (grouping) and which column to make wide into separate columns based on the values (variable). Specifying mean at the end indicates that if there is more than 1 row/column pair, the results will be averaged - it is not relevant in this case. If you google, "dcast in R" you will find many examples of how this works.
Comments
0 comments
Please sign in to leave a comment.