This article describes various methods of writing conditional formulas using R.
Requirements
- Some of the methods described in this article may require a full Displayr license.
- An R variable, calculation, or data set.
- The main condition operators are as follows, plus
%in%
for "equals any of":
Method
1. Boolean expression
A Boolean expression is an expression that evaluates to a logical value of true or false. Results that are true are returned as values of 1 and false as 0 (i.e., this is a way to construct a binary variable).
For example, we have two numeric variables, v1 and v2:
-
v1 != v2
returns a 1 for observations where v1 and v2 differ and a value of 0 when they are the same. -
is.na(v1)
returns a 1 for observations in v1 that have the value of NA (missing), otherwise, it returns a 0. -
rowSums(v1,v2) > 0
returns a 1 if the sum of v1 and v2 is greater than 0, otherwise, it returns a 0.
Boolean expressions can return unexpected results when you are working with multiple variables with missing data. If you do not want to return a missing value as the result of an expression if something is missing, it is recommended to use the %in%
operator, which will essentially ignore missing data. For example:
- Use
x %in% 1
instead ofx == 1
- Use
!(x %in% 1)
instead ofx != 1
2. The if...else method
Not all if-else code is created equal in R, see Challenges With 'if' When Writing R Code for a visual explanation. Basic conditional statements can be written using an if then else structure. This if-else structure compares only one value of variables used in the conditions and not all values. Thus these are NOT how if-else structures are processed in JavaScript code where each response in the data set is checked to recode a variable or something else. To recode variables using R please see How to Create a New Variable Based on Other Variables Using R and How to Recode Data Based on a Lookup Using R.
If-else structures in R are useful when routing how your R code is run. For example, if this table is empty show an error message else show the table (see How to Handle Outputs with Small or No Data Using R).
If you wish to use this method within an R variable, you will need to return a vector or data frame with the same length (number of rows) as the number of records in your data set. As an example of using this in an R variable, if nothing is selected in a combo box filter in all respondents; else create the filter based on the value of the combo box.
#test if there is something selected in the combo box, if not filter in everything
if(length(Combo.box.Age) == 0) rep(1, length(Age)) else
Age %in% Combo.box.Age
This is the same set of conditions using optional curly brackets and spacing:
if(length(Combo.box.Age) == 0){
rep(1, length(Age))
} else {
Age %in% Combo.box.Age
}
Another example with stringing if-else statements, the below will display the data from the BrandA variable if Brand 1 is selected, BrandB if Brand 2 is selected, otherwise BrandC:
if(combo.box=="Brand 1") BrandA else
if(combo.box=="Brand 2") BrandB else
BrandC
3. The ifelse method
There is also a shortcut method called ifelse that lets you write a condition in a single line. In the below example, the formula will return a Yes if x is greater than 1, otherwise a No:
ifelse(x>1,"Yes","No")
Note, this returns a value for each record in your x object. You can also nest this to additionally return Maybe if y is greater than 1:
ifelse(x>1,"Yes", ifelse(y>1,"Maybe","No"))
4. The switch method
An alternative to if...else is the switch function. Using the earlier example, we could write the following to achieve this result:
switch(x,3,3,2,1,1)
In this code, the value of x represents an index which tells it which subsequent value to return. So if x equals 4, it will return 1 as this is the fourth of the five recode values.
Note, this returns a single value only.
5. The subscripting method
A further conditional method that is useful for banding variables is to essentially apply filter conditions. Again using the same example, we can write the following:
x[x>=4] = 1
x[x==3] = 2
x[x<=2] = 3
x
Note, this returns a value for each record in your x object.
Alternatively, you can replace the values with labels so it returns a text output instead:
x[x>=4] = "Yes"
x[x==3] = "Maybe"
x[x<=2] = "No"
x
Note, changing General > Structure to Nominal for R variables will let Displayr automatically set up the value labels when it converts to a categorical variable. Similarly, if your code returns a factor (i.e. has a value and label), you will not need to manually add labels via Data > Data Values > Labels.
6. The case_when method
The dplyr R package offers the case_when function which is particularly useful for working with categorical data. Below is an example of how to recode an Age variable into groups:
dplyr::case_when(
Age == "18 to 24" ~ 1,
Age == "25 to 29" ~ 2,
Age %in% c("40 to 44", "45 to 49") ~ 3,
Age %in% c("50 to 54", "55 to 64", "65 or more") ~ 4
TRUE ~ 0
)
Looking at the code above, note that:
- For a single category, we use the
==
operator. - For multiple categories, we list them surrounded by
c()
and use the%in%
operator. - The values are assigned at the end of the line, after a
~
. - The
TRUE ~ 0
is optional and R reads this as assign 0 to "everybody else". If records don't fall into any of these conditions and this line is omitted, the result will return NA.
Let's now look at a more complex example that references multiple questions, Age and d4 (living arrangements). Here, we wish to create a household structure variable by using the &
operator:
dplyr::case_when(
# Young singles
Age %in% c("18 to 24", "25 to 29", "30 to 34") &
d4 %in% c("Living alone", "Sharing accommodation") ~ 1,
# Older singles
!Age %in% c("18 to 24", "25 to 29", "30 to 34") &
d4 %in% c("Living alone", "Sharing accommodation") ~ 2,
# Young couples
Age %in% c("18 to 24", "25 to 29", "30 to 34") &
d4 == "Living with partner only" ~ 3,
# Older couples
!Age %in% c("18 to 24", "25 to 29", "30 to 34") &
d4 == "Living with partner only" ~ 4,
# Young families
Age %in% c("18 to 24", "25 to 29", "30 to 34") &
d4 %in% c("Living with partner and children", "Living with children only") ~ 5,
# Older families
!Age %in% c("18 to 24", "25 to 29", "30 to 34") &
d4 %in% c("Living with partner and children", "Living with children only") ~ 6,
# Older families
TRUE ~ 7
)
A much nicer way of computing a household structure variable is shown in the code below:
young = Age %in% c("18 to 24", "25 to 29", "30 to 34")
single = d4 %in% c("Living alone", "Sharing accommodation")
partner.only = d4 == "Living with partner only"
children = d4 %in% c("Living with partner and children", "Living with children only")
dplyr::case_when(
young & single ~ 1,
!young & single ~ 2,
young & partner.only ~ 3,
!young & partner.only ~ 4,
young & children ~ 5,
!young & children ~ 6,
!children & !partner.only & !single ~ 7
)
This approach initially creates four variables as inputs to the main variable of interest. These variables are so-called scratch variables: they're only accessible to this specific code, and not from any other object or code in Displayr. They exist for the sole purpose of computing household structure. This time the first 4 lines each compute a variable with TRUE or FALSE for each row of data, and then case_when evaluates these using standard boolean logic for each row of data.
Note, be careful of using as.numeric to convert categorical data into numeric data to avoid referencing value labels in your code. These assigned values will not necessarily match the values that have been set in the raw data file. For example, if the data file contains values of 1 for Male and 2 for Female, but no respondent selected male, then the value of 1 would be assigned to Female.
In these cases, it's better to create a numeric copy of your variable to reference instead. You can do this by right-clicking your variable in the Data Sources tree, pressing Duplicate, and then changing Structure to Numeric on the General tab of the object inspector.
See Also
How to Create a New Variable Based on Other Variables Using R
How to Perform Mathematical Calculations Using R