Displayr has a built-in stacking widget that provides a point-and-click way of stacking SPSS .sav files, see How to Stack Data in Displayr. However, sometimes data quality issues like missing variables, reusing codes, and different value attributes won't allow the stacking to be done. In this instance, you will need to use R to stack the data.
Requirements
- An SPSS (.sav) data set saved on a cloud drive that is accessible without logging in or on your Displayr Cloud drive. The data set used below can be downloaded here.
- Note: If the relevant files are located within a Displayr Cloud Drive folder, ensure your document and appropriate users also have access to this folder. See How to Manage User Groups and Permissions in Displayr and How to Use the Displayr Cloud Drive for more details.
Method
1. To start, you should generate the list of variables you want to stack. It's recommended that you use the Stacking Code Generator Excel tool to create the R code needed to list out your stacked variables - Stacking Code Generator. Follow the instructions on the Instructions tab and paste the generated code in step (4) in the example code from below - being sure to remove the last comma.
2. Create the stacked data item
a. It's recommended to edit and run the code in a Calculation so you can review your data before loading and troubleshoot any errors that may come up. From the Report pane inserter menu or the toolbar (if on a Page) select Calculation > Custom Code. Click in the workarea to place the item, if on a Page.
b. Otherwise, you can add as a Data Set in the Data Sources pane via Add Data Plus (+) > R Code. Enter a name for the data set under Name.
3. Paste in the below R code in the Code pane and review each numbered step (1) in the code to make the appropriate edits. Some editing guidance is also provided below:
#Review the numbered list of settings below
#Load the foreign package to use
library(foreign)
### (1.a) SPECIFY the file location - link must end in .sav and automatically download the file
# if not, comment out the lines in this section by adding a leading # and comment in the code in section 1.b
location = "https://app.displayr.com/DataMart/File/6e8734bc-9963-44fe-8c9f-04c22efb3e83/Technology_2018.sav?company_id=751221"
thedata= suppressWarnings(read.spss(location,
use.value.labels = TRUE,
to.data.frame = TRUE))
### END 1.a
### (1.b) SPECIFY the file location - if using a file on your Displayr Cloud drive comment in below by removing a single # from each line
#library(flipAPI)
##put the file name from your Displayr Cloud drive inside '' below such as QLoadData('yourfile.sav')
#thedata=QLoadData('Technology_2018.sav')
##convert to a data frame to reshape later
#thedata=data.frame(thedata)
### END 1.b
### (2) specify your unique ID (Case) variable that can be used to link to the unstacked data
id.variable = 'RESPNUM'
###
### (3) list any other variables you want to include in the stacked dataset but leave unstacked
variables.to.stretch = c('Q1', 'Rec_Age')
###
### provide lists of variables to include in each stacked variable - they must have the same number
variables.to.stack = list(
# (4) REPLACE the below with a copy/paste from BLUE (first) column of Excel template
# if you need to insert blank values for missing variables then use the name 'dummy' in the list below
## !!!! be sure to remove the last comma before the closing round bracket !!!!
'Recommend' = c('Q3_01', 'Q3_02', 'Q3_03', 'Q3_04', 'Q3_05',
'Q3_06','Q3_07','Q3_08','Q3_09','Q3_10','Q3_11',
'Q3_12','Q3_13'),
'Fun' = c('Q4a_01','Q4a_02','Q4a_03','Q4a_04','Q4a_05','Q4a_06',
'Q4a_07','Q4a_08','Q4a_09','Q4a_10','Q4a_11','Q4a_12','Q4a_13'),
'Worth_what_you_pay_for' = c('Q4b_01','Q4b_02','Q4b_03','Q4b_04','Q4b_05','Q4b_06',
'Q4b_07','Q4b_08','Q4b_09','Q4b_10','Q4b_11','Q4b_12','Q4b_13'),
'Innovative' = c('Q4c_01','Q4c_02','Q4c_03','Q4c_04','Q4c_05','Q4c_06',
'Q4c_07','Q4c_08','Q4c_09','Q4c_10','Q4c_11','Q4c_12','Q4c_13'),
'Good_customer_service' = c('Q4d_01','Q4d_02','Q4d_03','Q4d_04','Q4d_05','Q4d_06',
'Q4d_07','Q4d_08','Q4d_09','Q4d_10','Q4d_11','Q4d_12','Q4d_13'),
'Stylish' = c('Q4e_01','Q4e_02','Q4e_03','Q4e_04','Q4e_05','Q4e_06',
'Q4e_07','Q4e_08','Q4e_09','Q4e_10','Q4e_11','Q4e_12','Q4e_13'),
'Easy_to_use' = c('Q4f_01','Q4f_02','Q4f_03','Q4f_04','Q4f_05','Q4f_06',
'Q4f_07','Q4f_08','Q4f_09','Q4f_10','Q4f_11','Q4f_12','Q4f_13'),
'High_quality' = c('Q4g_01','Q4g_02','Q4g_03','Q4g_04','Q4g_05','Q4g_06',
'Q4g_07','Q4g_08','Q4g_09','Q4g_10','Q4g_11','Q4g_12','Q4g_13'),
'High_performance' = c('Q4h_01','Q4h_02','Q4h_03','Q4h_04','Q4h_05','Q4h_06',
'Q4h_07','Q4h_08','Q4h_09','Q4h_10','Q4h_11','Q4h_12','Q4h_13'),
'Low_prices' = c('Q4i_01','Q4i_02','Q4i_03','Q4i_04','Q4i_05','Q4i_06',
'Q4i_07','Q4i_08','Q4i_09','Q4i_10','Q4i_11','Q4i_12','Q4i_13')
)
### END 4
### get lists of variables to include/exclude in the stacked data file
#create dummy variable in the data set to use if needed
thedata$dummy=NA
#pull off the column headers from the data set to compare to the lists
all.names <- names(thedata)
#identify variables to exclude from the stacked data set
variables.to.exclude = all.names[!all.names %in% c(unlist(variables.to.stack), id.variable, variables.to.stretch)]
#####
##check that all names input are in the data file
missingid=setdiff(id.variable,all.names)
missingstretch=setdiff(variables.to.stretch,all.names)
missingstack=setdiff(unlist(variables.to.stack),all.names)
##if any are missing, error the output and list out what is missing and what names are used in the file
if(length(missingid) > 0 ) stop(paste0("Please check your id.variable name\"",id.variable,"\"It
was not found in the data file. Available variable names are:\"",paste(all.names,collapse=", ")))
if(length(missingstretch) > 0 ) stop(paste0("Please check your variables.to.stretch,
the following could not be found in the data set:\"",paste(missingstretch,collapse=",
"),"\"\" Available variable names are:\"",paste(all.names,collapse=", ")))
if(length(missingstack) > 0 ) stop(paste0("Please check your variables.to.stack
list, the following could not be found in the data set:\"",paste(missingstack,collapse=",
"),"\"\" Available variable names are:\"",paste(all.names,collapse=", ")))
##check that all the stacked variables are stacking the same number of variables
lengths=sapply(variables.to.stack,length)
if(any(lengths != lengths[1])) stop(paste("You must be stacking the same number
of variables for each stacked variable. The number of variables stacked for each
stacked variable is below, please fix the variables.to.stack list so that all
numbers are the same. You can use the name dummy in the code to fill in for missing
variables.\"",paste(paste0(names(lengths),": ",lengths),collapse="\"")))
####
#put columns in data order as their lists above so variable lists can be in any order
thedata=thedata[,c(id.variable,variables.to.stretch,setdiff(colnames(thedata),c(id.variable, variables.to.stretch)))]
### Stack the data
stacked.data = reshape(data = thedata,
idvar = id.variable, direction = "long",
drop = variables.to.exclude,
varying = variables.to.stack)
###
### Format the stacked data
#make column names pretty
names(stacked.data) = c(id.variable, variables.to.stretch, "Observation", names(variables.to.stack))
###
### (5) OPTIONAL: START - Remove rows where all stacked variables are missing
### comment in the lines below until the end comment to remove cases in the stacked data set that are blank for stacked variables
# # Get stacked variable names (these are Recommend, Fun, etc from list of variables to stack in the example)
# stack.vars <- names(variables.to.stack)
# # OPTIONAL: Convert empty strings "" to NA (in case SPSS import created blanks)
# stacked.data[, stack.vars][stacked.data[, stack.vars] == ""] <- NA
# # Remove rows where ALL stacked values are NA
# stacked.data <- stacked.data[!apply(is.na(stacked.data[, stack.vars]), 1, all), ]
### END - Remove rows where all stacked variables are missing
### (6.a) OPTIONAL: if using this code to make an R data set or output, return the final stacked data set
# to instead save as a .sav file on the Displayr cloud drive, comment OUT step 6.a and comment IN step 6.b
#keep category metadata
library(haven)
stacked.data = as_factor(stacked.data)
### END 6
### (6.b) OPTIONAL: save the stacked data set to your Displayr cloud
### comment in the code below and change stackeddata.sav to the filename you wish
### comment out code for step 6 above
# #change stackeddata.sav to the filename you want to save
# #note files get overwritten automatically based on name
# stackedfile = "stackeddata.sav"
# #save the dataset to the cloud
# library(flipAPI)
# QSaveData(stacked.data,stackedfile)
### END 7- Steps (1.a) and (1.b) are used based on whether you are reading in a file from a URL (1.a) or the Displayr Cloud Drive (1.b). Only one section should be commented in and the other should be commented out.
- Note that the variables.to.stack object includes the following format:
- Each element of the list has a name to tell us what the variables mean. This will become the name of the variable in the stacked data frame.
- Ensure the names of your new stacked variables are valid SPSS variable names (no spaces, begin with an alphabetical character, are unique, etc).
- Each element of the list is a vector that tells us the variable names in the original data frame and the order in which they are to be stacked.
- If you don't have a variable to stretch, you can use the following code to set variables.to.stretch to an empty vector.
variables.to.stretch = c()- OPTIONAL step (5) in the code: If you are missing data across all the stacked variables in the stacked data set for an iteration, you may want to delete those observations to reduce the size of your data file. In this case, you can comment in the section for step 5 in the code. You may also want to replace blank text responses "" with missing data (NA) you can comment in that line in the section as well. A worked example is in Additional Notes below.
- Steps (6.a) and (6.b) determine how you'd like to return your stacked data. Only one of the sections should be commented in:
- (6.a) is commented in by default and returns the full raw data of the stacked data. You should use this in R Data Sets and Calculations.
- (6.b) will save the stacked data as a .sav file to your Displayr Cloud Drive (so you can download or import directly from the Displayr cloud drive). This is useful if stacking in a data preparation document. To do this, you'll comment out section 6.a, comment in section 6.b, change stackeddata.sav to the filename you wish, and recalculate.
6. R data doesn’t have the same level of metadata as some file types, like SSS and SAV. The code above will retain labels that are consistent across stacking, but care should be taken to review variables with inconsistencies. After the data set is loaded into Displayr, you should always review the Value Attributes and Structures of your variables.
7. OPTIONAL: There will be a new variable Observation in your stacked data file that identifies which iteration of the stacking the case comes from. You can relabel the Values of this variable in Displayr to correspond to the specific product/brand/item for that particular iteration. A worked example is in Additional Notes below.
8. OPTIONAL: To use this in the same document as your unstacked data, please see How to Create Data File Relationships.
Additional Notes
Below is an example demonstrating a couple of the optional components of the stacking above. Namely step 5 from the code that removes cases where all stacked variables are missing, and step 7 from the instructions that explains how to use the Observation variable to make new variables in your Document.
Here is a very simple example of unstacked data for cola ratings. Notice there are missing rating and ranking values in the data:
You can stack the data using the ID as the id.variable and Country as the variables.to.stretch (both in blue in the picture below). You can stack all the ratings into a Rating variable (pink) and all the rank variables into RankedCola (green). When you run the script a new variable Observation (orange) is created to denote what iteration the case in the unstacked data came from.
More details about (5) OPTIONAL from the code:
You'll see in the stacked data that some rows don't have any value for both stacked variables: Rating and RankedCola.
If you are working with a very large data set, you may want to remove these rows so that your data set is smaller and more efficient, since you will likely be excluding them from analysis anyway. To do this you can comment in the (5) OPTIONAL block of code from above. You'll see the result below where those rows have been removed.
More detail about step 7 from the instructions above:
Once you've added the stacked file to your Displayr document, you can Duplicate the Observation variable and change the Labels in Value Attributes to relabel the observations to the appropriate identifier. In this case, the RatedCola and Rank (in yellow below) for each row of the data.
UPCOMING WEBINAR: Run Research Your Way With AI Skills