The R language has a nifty feature called *vectorization *which saves a lot of time for users. This article introduces the key concept of Vectorization and recycling and explains the most common trap for beginners (Variables must be created with the correct length).

# Vectorization and recycling

Consider the problem of creating a variable that contains the difference between each person's age and the average age. We would do this in Displayr as follows:

- Insert a new variable (
**+ > Custom Code > R Numeric**). - Type
*age - Average(age)*

When the R language processes *age - Average(age) *its first step is to swap out the variable name, *age, *for the underlying data, which in this case is a variable. More generally, though, this variable is in the language of math and physics, a *vector. *So, in a data set with 11 cases, *age - Average(age) *becomes:

R then calculates the Average, so the calculation then reduces to:

Now, R has a problem, which is that it's trying to compare two *vectors *of different lengths. The way that it does this is it *recycles* the shorter vector, repeating its results until it's the same length as the longer vector (if the shorter vector cannot be neatly recycled, you get an error). In our example, this just involves repeating 41 eleven times:

R then recognizes that we are computing a difference between two vectors of the same size, so it applies the calculation we are performing to all the matching elements:

Now that the vectors are the same length, R can do the math automatically (this is called *vectorized maths*). The calculation returns a variable of length 11, and this becomes the variable that appears in the data.

# Variables must be created with the correct length

Now, consider a slightly different problem. Let's say for some reason that we wanted to create a variable that contains, for every respondent, the average age. A novice would try and use *Average(age) *and would get the error **Can only convert tabular results to an R variable or question**.

The problem is that Displayr is expecting the variable that is returned to it to be of the same length as the data file (11 cases in this example). But, instead, it only gets back a single value.

The fix is to use the *rep *function, which repeats the data the specified number of times. Most simply we could use *rep(Average(age), 11)*, but better code would be to use *rep(Average(age), length(age)), *as this will still work even if the input data file is changed to one with more or fewer cases.

# Next

Challenges With 'if' When Writing R Code

For a more general overview of using R in Displayr, see the Displayr Help section on R.

## Comments

0 comments

Please sign in to leave a comment.