How to Stack Data Using R

Displayr has a built-in stacking widget that provides a point-and-click way of stacking SPSS .sav files, see How to Stack Data in Displayr. However, sometimes data quality issues like missing variables, reusing codes, and different value attributes won't allow the stacking to be done. In this instance, you will need to use R to stack the data.

Requirements

An SPSS (.sav) data set saved on a cloud drive that is accessible without logging in or on your Displayr Cloud drive. The data set used below can be downloaded here.
- Note: If the relevant files are located within a Displayr Cloud Drive folder, ensure your document and appropriate users also have access to this folder. See How to Manage User Groups and Permissions in Displayr and How to Use the Displayr Cloud Drive for more details.

Method

1. To start, you should generate the list of variables you want to stack. It's recommended that you use the Stacking Code Generator Excel tool to create the R code needed to list out your stacked variables - Stacking Code Generator. Follow the instructions on the Instructions tab and paste the generated code in step (4) in the example code from below - being sure to remove the last comma.

2. Go to Data Sources > Plus (+) > R Code. OPTIONAL: Instead of adding as a data source, you can prototype the code in a Calculation by using Calculation > Custom Code from the toolbar and typing in your R Code. The Calculation will allow you to preview the results and modify your code in case there are any issues. You can then use that same code to add your R data set as in step 1.

3. Enter a name for the data set under Name.

4. Paste in the below R code where it states Enter your R code here, and review each numbered step in the code to make the appropriate edits:

#Review the numbered list of settings below
#Load the foreign package to use
library(foreign)

### (1.a) SPECIFY the file location - link must end in .sav and automatically download the file
# if not, comment out the lines in this section by adding a leading # and comment in the code in section 1.b
location = "https://app.displayr.com/DataMart/File/6e8734bc-9963-44fe-8c9f-04c22efb3e83/Technology_2018.sav?company_id=751221"
thedata= suppressWarnings(read.spss(location,
use.value.labels = TRUE,
to.data.frame = TRUE))
###

### (1.b) SPECIFY the file location - if using a file on your Displayr Cloud drive comment in below by removing a single # from each line
#library(flipAPI)
##put the file name from your Displayr Cloud drive inside '' below such as QLoadData('yourfile.sav')
#thedata=QLoadData('Technology_2018.sav')
##convert to a data frame to reshape later
#thedata=data.frame(thedata)
###

### (2) specify your unique ID (Case) variable that can be used to link to the unstacked data
id.variable = 'RESPNUM'
###

### (3) list any other variables you want to include in the stacked dataset but leave unstacked
variables.to.stretch = c('Q1', 'Rec_Age')
###

### provide lists of variables to include in each stacked variable - they must have the same number
variables.to.stack = list(
# (4) REPLACE the below with a copy/paste from BLUE (first) column of Excel template
# if you need to insert blank values for missing variables then use the name 'dummy' in the list below
## !!!! be sure to remove the last comma before the closing round bracket !!!! 
'Recommend' = c('Q3_01', 'Q3_02', 'Q3_03', 'Q3_04', 'Q3_05',
'Q3_06','Q3_07','Q3_08','Q3_09','Q3_10','Q3_11',
'Q3_12','Q3_13'),
'Fun' = c('Q4a_01','Q4a_02','Q4a_03','Q4a_04','Q4a_05','Q4a_06',
'Q4a_07','Q4a_08','Q4a_09','Q4a_10','Q4a_11','Q4a_12','Q4a_13'),
'Worth_what_you_pay_for' = c('Q4b_01','Q4b_02','Q4b_03','Q4b_04','Q4b_05','Q4b_06',
'Q4b_07','Q4b_08','Q4b_09','Q4b_10','Q4b_11','Q4b_12','Q4b_13'),
'Innovative' = c('Q4c_01','Q4c_02','Q4c_03','Q4c_04','Q4c_05','Q4c_06',
'Q4c_07','Q4c_08','Q4c_09','Q4c_10','Q4c_11','Q4c_12','Q4c_13'),
'Good_customer_service' = c('Q4d_01','Q4d_02','Q4d_03','Q4d_04','Q4d_05','Q4d_06',
'Q4d_07','Q4d_08','Q4d_09','Q4d_10','Q4d_11','Q4d_12','Q4d_13'),
'Stylish' = c('Q4e_01','Q4e_02','Q4e_03','Q4e_04','Q4e_05','Q4e_06',
'Q4e_07','Q4e_08','Q4e_09','Q4e_10','Q4e_11','Q4e_12','Q4e_13'),
'Easy_to_use' = c('Q4f_01','Q4f_02','Q4f_03','Q4f_04','Q4f_05','Q4f_06',
'Q4f_07','Q4f_08','Q4f_09','Q4f_10','Q4f_11','Q4f_12','Q4f_13'),
'High_quality' = c('Q4g_01','Q4g_02','Q4g_03','Q4g_04','Q4g_05','Q4g_06',
'Q4g_07','Q4g_08','Q4g_09','Q4g_10','Q4g_11','Q4g_12','Q4g_13'),
'High_performance' = c('Q4h_01','Q4h_02','Q4h_03','Q4h_04','Q4h_05','Q4h_06',
'Q4h_07','Q4h_08','Q4h_09','Q4h_10','Q4h_11','Q4h_12','Q4h_13'),
'Low_prices' = c('Q4i_01','Q4i_02','Q4i_03','Q4i_04','Q4i_05','Q4i_06',
'Q4i_07','Q4i_08','Q4i_09','Q4i_10','Q4i_11','Q4i_12','Q4i_13')
)
###

### get lists of variables to include/exclude in the stacked data file
#create dummy variable in the data set to use if needed
thedata$dummy=NA
#pull off the column headers from the data set to compare to the lists
all.names <- names(thedata)
#identify variables to exclude from the stacked data set
variables.to.exclude = all.names[!all.names %in% c(unlist(variables.to.stack), id.variable, variables.to.stretch)]

#####
##check that all names input are in the data file
missingid=setdiff(id.variable,all.names)
missingstretch=setdiff(variables.to.stretch,all.names)
missingstack=setdiff(unlist(variables.to.stack),all.names)

##if any are missing, error the output and list out what is missing and what
names are used in the file
if(length(missingid) > 0 ) stop(paste0("Please check your id.variable name\"",id.variable,"\"It
  was not found in the data file. Available variable names are:\"",paste(all.names,collapse=", ")))
if(length(missingstretch) > 0 ) stop(paste0("Please check your variables.to.stretch,
  the following could not be found in the data set:\"",paste(missingstretch,collapse=",
  "),"\"\" Available variable names are:\"",paste(all.names,collapse=", ")))
if(length(missingstack) > 0 ) stop(paste0("Please check your variables.to.stack
  list, the following could not be found in the data set:\"",paste(missingstack,collapse=",
  "),"\"\" Available variable names are:\"",paste(all.names,collapse=", ")))

##check that all the stacked variables are stacking the same number of variables
lengths=sapply(variables.to.stack,length)
if(any(lengths != lengths[1])) stop(paste("You must be stacking the same number
  of variables for each stacked variable. The number of variables stacked for each
  stacked variable is below, please fix the variables.to.stack list so that all
  numbers are the same. You can use the name dummy in the code to fill in for missing
  variables.\"",paste(paste0(names(lengths),": ",lengths),collapse="\"")))
#####

#put columns in data order as their lists above so variable lists can be in any order
thedata=thedata[,c(id.variable,variables.to.stretch,setdiff(colnames(thedata),c(id.variable, variables.to.stretch)))]

### Stack the data
stacked.data = reshape(data = thedata,
idvar = id.variable, direction = "long",
drop = variables.to.exclude,
varying = variables.to.stack)
###

### Format the stacked data
#make column names pretty
names(stacked.data) = c(id.variable, variables.to.stretch, "Observation", names(variables.to.stack))
###

### (5) return the final stacked data set or read article for how to save to Displayr cloud
#keep category metadata
library(haven)
stacked.data = as_factor(stacked.data)
###

Note that the variables.to.stack object includes the following format:
- Each element of the list has a name to tell us what the variables mean. This will become the name of the variable in the stacked data frame.
- Ensure the names of your new stacked variables are valid SPSS variable names (no spaces, begin with an alphabetical character, are unique, etc).
- Each element of the list is a vector that tells us the variable names in the original data frame and the order in which they are to be stacked.
If you don't have a variable to stretch, you can use the following code to set variables.to.stretch to an empty vector.

variables.to.stretch = c()

5. OPTIONAL: If you would rather save your stacked data to a .sav file on your Displayr Cloud Drive, in the case that you're working from a data preparation document, you should:

Copy your stacking code to a calculation created by going to Calculation > Custom Code in the toolbar.
Replace the code in Step 5 of the example above with the following:

### (5) specify filename and save the final stacked data as a sav to the Displayr Cloud
#change stackeddata.sav to the filename you want to save 
#note files get overwritten automatically based on name
stackedfile = "stackeddata.sav"
#save the dataset to the cloud
library(flipAPI)
QSaveData(stacked.data,stackedfile)
###

6. R data doesn’t have the same level of metadata as some file types, like SSS and SAV. The code above will retain labels that are consistent across stacking, but care should be taken to review variables with inconsistencies. After the data set is loaded into Displayr, you should always review the Value Attributes and Structures of your variables.

7. OPTIONAL: There will be a new variable Observation in your stacked data file that identifies which iteration of the stacking the case comes from. You can relabel the Values of this variable to correspond to the appropriate product/brand/item for that particular iteration.

8. OPTIONAL: To use this in the same document as your unstacked data, please see How to Create Data File Relationships.

How to Unstack Data Using R

How to Create Data File Relationships

Articles in this section

Requirements

Method

Next

Articles in this section

Requirements

Method

Next

Related articles