A common consequence of R's nifty vectorization is confusion regarding how to use *if *when writing code. R's *if *function is not vectorized, which means it is usually not very useful when creating new variables, and users are better off instead using either ifelse or subscripting.

*if* is not vectorized

This example continues from the example in R's Vectorized Math and Custom Variable Creation.

Let's say we wanted to create a new variable containing a 1 if a person was below the average age, and a value of 2 otherwise. An obvious but wrong way of writing this would be:

if (age < Average(age)) 1 else 2

We can see the problem by working through how R would interpret this line of code. First, it replaces the variable names with the data:

Then, it calculates the average:

Then it recycles the average so that things match:

Then, it compares the two vectors, so we have:

And, at this stage, R will produce the following error **the condition has length > 1**, by which it means that *if* statements only work if there is a single TRUE or FALSE in the *if* statement, which is clearly not true in this case.

# ifelse

A solution to the problem above is to use instead *ifelse. *For example:

#test if each age is less than the overall average, if true the result is 1,

#if false the result is 2

ifelse(age < Average(age), 1, 2)

Note that *ifelse *uses commas and closes the parentheses at the end of the function. This is evaluated as follows:

R then looks at the three elements of the function, separated by commas, and evaluates them one at a time, giving us:

Now, just as described in R's Vectorized Math and Custom Variable Creation, R needs to use recycling to stretch the 1s and 2s to match the vectors of trues and falses, giving us:

Many functions in R, including *ifelse* (but not including *if*) are *vectorized, *which means that the function itself is just repeated to occur the same number of times as there are elements in each of the vectors:

This then returns a vector of length 11.

# Subscripting

Another way of achieving the same outcome is with the following code:

#create a vector containing 11 values of 2 (11 is the length of age)

out = rep(2, length(age))

#find positions in vector where age is less than the average and assign them a value of 1

out[age < Average(age)] = 1

#return the complete final result

out

# Next

The article How to Work with Conditional R Formulas describes more detail about how if and related statements work in R.

Adding Value Labels When Creating Variables describes how to create labels in categorical variables.

For a more general overview of using R in Displayr, see the Displayr Help section on R.

## Comments

0 comments

Please sign in to leave a comment.