This article describes how to reshape data into a long or wide format using R. This can help when you want to aggregate data in a specific way (using dplyr), stack a single variable (using melt), or de-loop variables into individual variables (using dcast). You can do this to create a new table using data already imported into Displayr or you can add on this code to code used in How to Import a Data Set Using R.
Below is a table of raw data for preferred cola and ratings of each cola with 600 rows/respondents. You can transform your data from its current shape...
...to long shape. Now each row is a combination of preferred cola - cola rated - rating, amounting to 3600 ratings/rows:
You can also reshape the data into summarized data. The table below shows the 80th percentile rating for each cola based on preferred cola, 48 pairs/rows:
And you can reshape the above back to wide data. The below shows the 80th percentile rating for each cola based on preferred cola with colas rated in the columns and preferred cola in the rows:
Requirements
Please note these steps require a Displayr license.
- Numeric variables or a variable set
- [Optional] grouping variable - only needed for custom calculations or making data wide.
- To follow along with the example below download the QPack here.
Method - Reshape to Long
In the example below we take a variable for Preferred Cola and use it alongside a numeric Brand attitude variable set. Then we stack the Brand attitude variable set, making it long. Data in a long format like this can be used to create ggplot visualizations and to create other calculations based on grouping variables. In the example below, we take the 80th percentile of each brand rating based on what the respondents' preferred cola is.
- Click Calculation > Custom in the Toolbar and draw an output on the page.
- In the R Code box, paste in the following code. Note the comments (prefaced with a #) are there to explain what the code does so you can modify it for your own needs:
#identify the variables or variable set to make long
variablestomelt=`Brand attitude Numeric`
#add in any other grouping variables you'd like
grouping=`Preferred cola`
#combine into data.frame for melting (making long)
thedata=data.frame(grouping,variablestomelt,check.names=F)
#remove any sum columns if there is numeric data
thedata=thedata[,colnames(thedata) != "SUM"]
#reshape the data into long format
library(reshape2)
melted=melt(thedata, id.vars="grouping") -
Click Calculate. The data will appear in a long format.
You can copy this code to create a new Data Set (see: How to Import a Data Set Using R), and use it as an input into a custom ggplot visualization using R, or as an input into other analyses or outputs in your document. The long format also makes it easy to perform calculations for sub-groups in the data. If you append the following code to your code above you will see the 80th percentile for each preferred cola - cola rated pair.
###Calculate 80th percentile for each preferred - rated pair
#load functions needed
library(dplyr)
#create a new data frame called percentiles from the melted data
percentiles = melted %>%
group_by(grouping, variable) %>% #group the calculations by the grouping/variable pairs
summarize(`80th pctl`=quantile(value, .8, na.rm=T)) #perform the 80th percentile calc on each group
If you google, "melt in R" or "dplyr summarize in R" you will find many examples of how the two main functions above work.
Method - Reshape to Wide
In the example below we take data that is in the long format and make it wide. We are using the data directly above for 80th percentiles and we want to reshape it into a wide format where each row is the preferred cola and each column is the cola rated.
- Click Calculation > Custom in the Toolbar and draw an output on the page.
- In the R Code box paste in the below code. Note the comments (prefaced with a #) are there to explain what the code does so you can modify it for your own needs. The input for this is the long-format data in the snapshot directly above.
#load reshape library if not done earlier in code
library(reshape2)
#long format data
longdata = percentiles
#make the percentiles data frame wide where the the row names
#are the unique values in the grouping variable and columns are the
#categories in the variable column. If duplicate records, the value will be the mean
#(in this example there are no duplicate records)
#replace grouping to the column name you want to be the row names
#replace variable to the column you want to be the column headers
wide=dcast(longdata, grouping ~ variable, mean)
#rename the grouping column since it was removed when the data was melted above
colnames(wide)[1]="Preferred Cola"
#return final result
wide
You can also use your data directly in the dcast
function where percentiles are used. The grouping ~ variable bit identifies which column(s) to keep as single columns (grouping) and which column to make wide into separate columns based on the values (variable). Specifying "mean" at the end indicates that if there is more than 1 row/column pair, the results will be averaged - it is not relevant in this case. If you google, "dcast in R" you will find many examples of how this works.