A common consequence of R's nifty vectorization is confusion regarding how to use if when writing code. R's if function is not vectorized, which means it is usually not very useful when creating new variables, and users are better off using either ifelse or subscripting. Another way of thinking about it is:
- if you want to generate a series of values by testing something multiple times, then use ifelse() or Subscripting
- if you want to do a bunch of commands if a particular criteria is met, then use if()
if is not vectorized
This example continues from the example in R's Vectorized Math and Custom Variable Creation.
Let's say we wanted to create a new variable containing a 1 if a person was below the average age, and a value of 2 otherwise. An obvious but wrong way of writing this would be:
if (age < Average(age)) 1 else 2
or the equivalent of
if (age < Average(age)){
1
} else {
2
}
We can see the problem by working through how R would interpret this line of code. First, it replaces the variable names with the data:
Then, it calculates the average:
Then it recycles the average so that things match:
Then, it compares the two vectors, so we have:
And, at this stage, R will produce the following error the condition has length > 1, by which it means that if statements only work if there is a single TRUE or FALSE in the if statement, which is clearly not true in this case.
ifelse
A solution to the problem above is to use instead ifelse. For example:
#test if each age is less than the overall average, if true the result is 1,
#if false the result is 2
ifelse(age < Average(age), 1, 2)
Note that ifelse uses commas and closes the parentheses at the end of the function. This is evaluated as follows:
R then looks at the three elements of the function, separated by commas, and evaluates them one at a time, giving us:
Now, just as described in R's Vectorized Math and Custom Variable Creation, R needs to use recycling to stretch the 1s and 2s to match the vectors of trues and falses, giving us:
Many functions in R, including ifelse (but not including if) are vectorized, which means that the function itself is just repeated to occur the same number of times as there are elements in each of the vectors:
This then returns a vector of length 11.
Subscripting
Another way of achieving the same outcome is by subscripting an object using square brackets []. How you do this is based on the structure of the item you're subsetting, see How to Work with Data in R for a reference on what to put in the []. The example above can be achieved using subcripting in the following code:
#create a final variable called out with default values
#repeat (rep) the value 2 for the same number of items/length as the age variable
out = rep(2, length(age))
#use out[] to subset the variable and assign those positions a new value
#we put the criteria where age is less than the average inside []
#we put the new value to assign those positions as 1 on the right of =
out[age < Average(age)] = 1
#return the complete final result
out
Next
The article How to Work with Conditional R Formulas describes more detail about how if and related statements work in R.
Adding Value Labels When Creating Variables describes how to create labels in categorical variables.
For a more general overview of using R in Displayr, see the Displayr Help section on R.