A common consequence of R's nifty vectorization is confusion regarding how to use if when writing code. R's if function is not vectorized, which means it is usually not very useful when creating new variables, and users are better off instead using either ifelse or subscripting.
if is not vectorized
This example continues from the example in R's Vectorized Math and Custom Variable Creation.
Let's say we wanted to create a new variable containing a 1 if a person was below the average age, and a value of 2 otherwise. An obvious but wrong way of writing this would be:
if (age < Average(age)) 1 else 2
We can see the problem by working through how R would interpret this line of code. First, it replaces the variable names with the data:
Then, it calculates the average:
Then it recycles the average so that things match:
Then, it compares the two vectors, so we have:
And, at this stage, R will produce the following error the condition has length > 1, by which it means that if statements only work if there is a single TRUE or FALSE in the if statement, which is clearly not true in this case.
A solution to the problem above is to use instead ifelse. For example:
#test if each age is less than the overall average, if true the result is 1,
#if false the result is 2
ifelse(age < Average(age), 1, 2)
Note that ifelse uses commas and closes the parentheses at the end of the function. This is evaluated as follows:
R then looks at the three elements of the function, separated by commas, and evaluates them one at a time, giving us:
Now, just as described in R's Vectorized Math and Custom Variable Creation, R needs to use recycling to stretch the 1s and 2s to match the vectors of trues and falses, giving us:
Many functions in R, including ifelse (but not including if) are vectorized, which means that the function itself is just repeated to occur the same number of times as there are elements in each of the vectors:
This then returns a vector of length 11.
Another way of achieving the same outcome is with the following code:
#create a vector containing 11 values of 2 (11 is the length of age)
out = rep(2, length(age))
#find positions in vector where age is less than the average and assign them a value of 1
out[age < Average(age)] = 1
#return the complete final result
The article How to Work with Conditional R Formulas describes more detail about how if and related statements work in R.
Adding Value Labels When Creating Variables describes how to create labels in categorical variables.
For a more general overview of using R in Displayr, see the Displayr Help section on R.
Please sign in to leave a comment.