This article introduces the key concepts to be aware of when migrating SPSS syntax to Displayr:
- A change in mindset is required: writing code is usually the wrong way forward
- Most of migration is learning how Displayr works
- Key differences between SPSS Syntax and R code
- You cannot overwrite raw data in Displayr
- How to deal with multiple related computed variables (e.g., segments, factor scores)
A change in mindset is required: writing code is usually the wrong way forward
The biggest challenge when migrating SPSS syntax to Displayr is the mental one. In SPSS, the most efficient users primarily use syntax. Some even automate the writing of syntax (e.g., using Excel). They do this because:
- In the long run, it's faster. If you instead use the user interface it means that whenever anything changes (e.g., a data file is replaced, some data is recoded), you need to then re-do everything again manually. However, if you have written syntax, you can just re-run that syntax which automatically redoes everything.
- The syntax files become a form of documentation, which is useful in quality assurance (e.g., to check how a result was computed).
As such, most SPSS Syntax is used for one of two broad goals:
- To re-run things. For example:
- Compute/recode/re-label variables when a revised data file is received.
- Compute/recode variables when input variables are changed (e.g., after data cleaning an input variable)
- To create documentation, so that it is easy to check how results were calculated.
While writing syntax is time-consuming, if it achieves these two goals most power users conclude it's worth the time.
Displayr's completely different to SPSS however:
- Everything is automatically re-run whenever it needs to be re-run. There's no need ever to tell Displayr to recompute something.
- It is always possible to track back any result to see how it has been calculated.
While there are ways of writing things in Displayr using code, it's virtually always the least efficient and the most error-prone way of working. This is of course true for most modern technology. You don't need to write code to check the time on your smartphone, for example.
Most of migration is learning how Displayr works
A common request is for a "translation service", where a user can paste in some SPSS syntax, and get back out at the other end of a Displayr document or some code to run in the Displayr document.
There is no such service, and there never can be. This is because the vast majority of things that people write SPSS Syntax to do are done automatically in Displayr, so the way you migrate the syntax is usually to figure out how to achieve the same outcome in Displayr without writing code, and then delete the corresponding syntax. (And yes, if you are super advanced and write much more advanced SPSS syntax than the average person, this is still true.)
For example, in commercial market research, it is very common for SPSS users to write syntax to recode five-point attitude scales into top two box scores. In Displayr:
- The less technical user will select the variable set, and press + > Ready-Made New Variables > Top 2 Category Variable(s) (Top 2 Boxes).
- The power user will duplicate the variable set, and change its structure to Binary - Multi.
Key differences between SPSS Syntax and R code
Displayr supports two different languages for creating new variables, R and JavaScript. For most problems, R is preferable. R's syntax is broadly very similar to SPSS's for variable computation, so it's often quite straightforward to modify.
- R is case-sensitive. For example:
- sum and Sum do different things (sum returns a missing value if any of the values being summed are missing, whereas Sum ignores the missing values)
- TRUE is a logical value, whereas true isn't (in R, you could write true = 5, and then whenever somebody used the word true in their code, it would insert the value of 5)
- You should not put a period (.) at the end of a line of code in R.
- Comments in SPSS are commenced with a *. In R, with a #.
- SPSS's IF is equivalent to R's ifelse and not to R's if. See also Challenges With 'if' When Writing R Code.
- SPSS uses = to mean two different things: assignment and equality. For example, in SPSS dog = 1 may in some contexts create a new variable called dog, assigning it a value of 1. In other contexts, such as an IF statement, dog = 1 may be a logical expression, returning a true if a dog has the value of 1 and a false otherwise. By contrast, in R, = indicates assignment, and == indicates quality (dog = 1 always creates a variable called dog, whereas dog == 1 is always TRUE if dog has a value of 1).
- There is no equivalent in R to SPSS's RECODE. (There are some functions that appear on a superficial level to be similar, but in general, in Displayr you should instead be changing the structure of variable sets to perform recoding).
- R executes everything, so it has no equivalent to SPSS's EXECUTE.
- R is vectorized. See R's Vectorized Math and Custom Variable Creation.
For a more general overview of using R in Displayr, see the Displayr Help section on R.
You cannot overwrite raw data in Displayr
In SPSS, the process for creating new variables is the same as the process for updating variables. For example, in SPSS you can write and run code like this, which converts height measured in inches to height measured in centimeters.
COMPUTE Height = Height * 2.54.
EXECUTE
The problem with such a line of code is that you only should run it once. If it is accidentally run a second time, the results become gibberish, as it will multiply the heights in centimeters by 2.54 one more time.
If you save the data set, there's no way to undo the modification to the data unless you've retained a copy of the original data.
Displayr, however, doesn't allow you to make mistakes like this. There are two aspects to this:
- If you wanted to multiply height in inches by 2.54, it would require that you create a new variable, not letting you overwrite the old variable. Similarly, common SPSS syntax moves like recoding into existing variables (e.g., recode v4 to v6 (1,2,3 = 0)(4,5 = 1)) are extremely difficult to perform in Displayr, as they are the wrong way to use the app (the correct way is to change the Structure of the variable set containing v4 to v6 to Binary - Multi).
- Displayr never modifies the raw data file. Rather, it keeps track of every step that's been applied when modifying the raw data. This is what allows it to redo things automatically when the data changes.
How to deal with multiple related computed variables (e.g., segments, factor scores)
Sometimes SPSS syntax contains multiple COMPUTE statements that are used to create some type of complicated derived variable (e.g., factor scores, segments). For example, the code below:
- Takes 6 input variables (q2a, q2b, ..., Q2f).
- Creates scaled versions of the input variables (C1, C2, .... , C6).
- Creates two factors, Attention and Involvement, using the scaled variables.
- Creates a final segmentation variable, Conversion, using the two factors.
When the SPSS syntax below is run, a total of 9 new variables are created.
COMPUTE
C1=(q2a-3)*(-1)+(-0.31372947).
EXECUTE.
COMPUTE
C2=(q2b-3.5)*(1)+(0.17423391).
EXECUTE.
COMPUTE
C3=(q2c-3.5)*(1)+(0.2210665).
EXECUTE.
COMPUTE
C4=(q2d-3.5)*(-1)+(0.10578492).
EXECUTE.
COMPUTE
C5=(q2e-3.5)*(1)+(-0.55299267).
EXECUTE.
COMPUTE
C6=(q2f-3.5)*(-1)+(-0.04344776).
EXECUTE.
*********************************FACTORS********************************************
COMPUTE
Attention=2.44949*(2.44949+((C1*0.536)+(C2*0.623)+(C3*0.589))/3).
EXECUTE.
COMPUTE
Involvement=2.44949*(2.44949+((C4*0.46)+(C5*0.63)+(C6*0.56))/3).
EXECUTE.
********************************CONVERSION CLASSIFICATION**************************
IF ((Attention >= 6) & (Involvement>= 6)) Conversion=1.
EXECUTE.
IF ((Attention >= 6) & (Involvement< 6)) Conversion=2.
EXECUTE.
IF ((Attention < 6) & (Involvement>= 6)) Conversion=3.
EXECUTE.
IF ((Attention < 6) & (Involvement< 6)) Conversion=4.
EXECUTE.
VARIABLE LABELS
Conversion Effectiveness classification.
EXECUTE.
VALUE LABELS
Conversion
1 'Committed'
2 'Entertained'
3 'Potentials'
4 'Untouched'.
EXECUTE.
By contrast, in Displayr, we insert a single custom code numeric R variable, paste in the code below, and change its Structure to Nominal, and that creates the same variable that was obtained above. While the underlying code creates all the 9 variables that were created in the SPSS syntax, only the final variable is actually saved into the data set. (But, Displayr remembers all the code, and will recompute the variable if any of the inputs change).
C1=(q2a-3)*(-1)+(-0.31372947)
C2=(q2b-3.5)*(1)+(0.17423391)
C3=(q2c-3.5)*(1)+(0.2210665)
C4=(q2d-3.5)*(-1)+(0.10578492)
C5=(q2e-3.5)*(1)+(-0.55299267)
C6=(q2f-3.5)*(-1)+(-0.04344776)
# Factors
Attention=2.44949*(2.44949+((C1*0.536)+(C2*0.623)+(C3*0.589))/3)
Involvement=2.44949*(2.44949+((C4*0.46)+(C5*0.63)+(C6*0.56))/3)
# Conversion classification
conversion = rep(NA, length(Attention)) # creating variable with only missing values
conversion[Attention >= 6 & Involvement >= 6] = 1
conversion[Attention >= 6 & Involvement < 6] = 2
conversion[Attention < 6 & Involvement >= 6] = 3
conversion[Attention < 6 & Involvement < 6] = 4
factor(conversion, levels = 1:4, labels = c('Committed', 'Entertained', 'Potentials', 'Untouched'))
If we wanted to save some of the other variables into the data set that can be done as well. For example, we could create each variable one by one, or:
- Use + > Custom Code > Multiple R Variables > Numeric.
- Indicate how many R variables we wish to create (e.g., 3).
- Paste in the code below. Note that only the last few lines are different. In particular:
- The factor is now assigned to a variable called Conversion.
- The three variables that are being returned are explicitly mentioned in the last line of code with the data.frame function.
- Select the resulting variable set, right-click, and select Split (this breaks it into three separate variables)
- Change the Conversion variable's Structure to Nominal.
C1=(q2a-3)*(-1)+(-0.31372947)
C2=(q2b-3.5)*(1)+(0.17423391)
C3=(q2c-3.5)*(1)+(0.2210665)
C4=(q2d-3.5)*(-1)+(0.10578492)
C5=(q2e-3.5)*(1)+(-0.55299267)
C6=(q2f-3.5)*(-1)+(-0.04344776)
# Factors
Attention=2.44949*(2.44949+((C1*0.536)+(C2*0.623)+(C3*0.589))/3)
Involvement=2.44949*(2.44949+((C4*0.46)+(C5*0.63)+(C6*0.56))/3)
# Conversion classification
conversion = rep(NA, length(Attention)) # creating variable with only missing values
conversion[Attention >= 6 & Involvement >= 6] = 1
conversion[Attention >= 6 & Involvement < 6] = 2
conversion[Attention < 6 & Involvement >= 6] = 3
conversion[Attention < 6 & Involvement < 6] = 4
Conversion = factor(conversion, levels = 1:4,
labels = c('Committed', 'Entertained', 'Potentials', 'Untouched'))
data.frame(Attention, Involvement, Conversion)