Displayr has a built-in stacking widget that provides a point and click way of stacking SPSS .sav files, see How to Stack Data in Displayr. However, sometimes data quality issues like missing variables, reusing codes, and different value attributes won't allow the stacking to be done. In this instance, you will need to use R to stack the data.
Requirements
- An SPSS (.sav) data set saved on a cloud drive that is accessible without logging in or on your Displayr Cloud drive. The data set used below can be downloaded here.
Please note these steps require a Displayr license.
Method
1. To start, you should generate the list of variables you want to stack. It's recommended that you use the Stacking Code Generator Excel tool to create the R code needed to list out your stacked variables - Stacking Code Generator. Follow the instructions on the Instructions tab and paste the generated code in step (4) in the example code from below - being sure to remove the last comma.
2. Go to Data Sources > Plus (+) > R. OPTIONAL: Instead of adding as a data source, you can prototype the code in a Calculation by using the Calculation icon > Custom Code and typing in your R CODE. The Calculation will allow you to preview the results and modify your code in case there are any issues. You can then use that same code to add your R data set as in step 1.
3. Enter a name for the data set under Name.
4. Paste in the below R code where it states Enter your R code here and review each numbered step in the code to make the appropriate edits:
#Review the numbered list of settings below
#Load the foreign package to use
library(foreign)
### (1.a) SPECIFY the file location - link must end in .sav and automatically download the file
# if not, comment out the lines in this section and comment in the code in section 1.b
location = "https://app.displayr.com/DataMart/File/6e8734bc-9963-44fe-8c9f-04c22efb3e83/Technology_2018.sav?company_id=751221"
thedata= suppressWarnings(read.spss(location,
use.value.labels = TRUE,
to.data.frame = TRUE))
###
### (1.b) SPECIFY the file location - if using a file on your Displayr Cloud drive comment in below
#library(flipAPI)
##put the file name from your Displayr Cloud drive inside '' below
#thedata=QLoadData('Technology_2018.sav')
##convert to a data frame to reshape later
#thedata=data.frame(thedata)
###
### (2) specify your unique ID (Case) variable that can be used to link to the unstacked data
id.variable = 'RESPNUM'
###
### (3) list any other variables you want to include in the stacked dataset but leave unstacked
variables.to.stretch = c('Q1', 'Rec_Age')
###
### provide lists of variables to include in each stacked variable
variables.to.stack = list(
# (4) REPLACE the below with a copy/paste from BLUE (first) column of Excel template
# if you need to insert blank values then use the name 'dummy' in the list below
## !!!! be sure to remove the last comma before the closing round bracket !!!!
'Recommend' = c('Q3_01', 'Q3_02', 'Q3_03', 'Q3_04', 'Q3_05',
'Q3_06','Q3_07','Q3_08','Q3_09','Q3_10','Q3_11',
'Q3_12','Q3_13'),
'Fun' = c('Q4a_01','Q4a_02','Q4a_03','Q4a_04','Q4a_05','Q4a_06',
'Q4a_07','Q4a_08','Q4a_09','Q4a_10','Q4a_11','Q4a_12','Q4a_13'),
'Worth_what_you_pay_for' = c('Q4b_01','Q4b_02','Q4b_03','Q4b_04','Q4b_05','Q4b_06',
'Q4b_07','Q4b_08','Q4b_09','Q4b_10','Q4b_11','Q4b_12','Q4b_13'),
'Innovative' = c('Q4c_01','Q4c_02','Q4c_03','Q4c_04','Q4c_05','Q4c_06',
'Q4c_07','Q4c_08','Q4c_09','Q4c_10','Q4c_11','Q4c_12','Q4c_13'),
'Good_customer_service' = c('Q4d_01','Q4d_02','Q4d_03','Q4d_04','Q4d_05','Q4d_06',
'Q4d_07','Q4d_08','Q4d_09','Q4d_10','Q4d_11','Q4d_12','Q4d_13'),
'Stylish' = c('Q4e_01','Q4e_02','Q4e_03','Q4e_04','Q4e_05','Q4e_06',
'Q4e_07','Q4e_08','Q4e_09','Q4e_10','Q4e_11','Q4e_12','Q4e_13'),
'Easy_to_use' = c('Q4f_01','Q4f_02','Q4f_03','Q4f_04','Q4f_05','Q4f_06',
'Q4f_07','Q4f_08','Q4f_09','Q4f_10','Q4f_11','Q4f_12','Q4f_13'),
'High_quality' = c('Q4g_01','Q4g_02','Q4g_03','Q4g_04','Q4g_05','Q4g_06',
'Q4g_07','Q4g_08','Q4g_09','Q4g_10','Q4g_11','Q4g_12','Q4g_13'),
'High_performance' = c('Q4h_01','Q4h_02','Q4h_03','Q4h_04','Q4h_05','Q4h_06',
'Q4h_07','Q4h_08','Q4h_09','Q4h_10','Q4h_11','Q4h_12','Q4h_13'),
'Low_prices' = c('Q4i_01','Q4i_02','Q4i_03','Q4i_04','Q4i_05','Q4i_06',
'Q4i_07','Q4i_08','Q4i_09','Q4i_10','Q4i_11','Q4i_12','Q4i_13')
)
###
### get lists of variables to include/exclude in the stacked data file
#create dummy variable in the data set to use if needed
thedata$dummy=NA
#pull off the column headers from the data set to compare to the lists
all.names <- names(thedata)
#identify variables to exclude from the stacked data set
variables.to.exclude = all.names[!all.names %in% c(unlist(variables.to.stack), id.variable, variables.to.stretch)]
###
### Stack the data
stacked.data = reshape(data = thedata,
idvar = id.variable, direction = "long",
drop = variables.to.exclude,
varying = variables.to.stack)
###
### Format the stacked data
#make column names pretty
names(stacked.data) = c(id.variable, variables.to.stretch, "Observation", names(variables.to.stack))
###
### (5) return the final stacked data set or read article for how to save to Displayr cloud
#keep category metadata
library(haven)
stacked.data = as_factor(stacked.data)
###
- Note that the variables.to.stack object includes the following format:
- Each element of the list has a name to tell us what the variables mean. This will become the name of the variable in the stacked data frame.
- Each element of the list is a vector that tells us the variable names in the original data frame and the order in which they are to be stacked.
- If you don't have a variable to stretch you can use the following code to set variables.to.stretch to an empty vector.
variables.to.stretch = c()
5. OPTIONAL: If you would rather save your stacked data to a .sav file on your Displayr Cloud Drive, in the case that you're working from a data preparation document, you should:
- Copy your stacking code to a the Calculation icon
> Custom Code
- Ensure the names of your new stacked variables are valid SPSS variables names (no spaces, begin with an alphabetical character, are unique, etc).
- Replace the code in step 5 of the example above with the following:
### (5) specify filename and save the final stacked data as a sav to the Displayr Cloud
#change stackeddata.sav to the filename you want to save
#note files get overwritten automatically based on name
stackedfile = "stackeddata.sav"
#save the dataset to the cloud
library(flipAPI)
QSaveData(stacked.data,stackedfile)
###
6. R data doesn’t have the same level of metadata as some file types, like SSS and SAV. The code above will retain labels that are consistent across stacking, but care should be taken to review variables with inconsistencies. After the data set is loaded into Displayr, you should always review the Value Attributes and Structures of your variables.
7. OPTIONAL: There will be a new variable Observation in your stacked data file that identifies which iteration of the stacking the case comes from. You can relabel the Values of this variable to correspond to the appropriate product/brand/item for that particular iteration.
8. OPTIONAL: To use this in the same document as your unstacked data, please see How to Link Data Sets Together in a Document
Next