The R language has a nifty feature called vectorization which saves a lot of time for users. This article introduces the key concept of Vectorization and recycling and explains the most common trap for beginners (Variables must be created with the correct length).
Vectorization and recycling
Consider the problem of creating a variable that contains the difference between each person's age and the average age. We would do this in Displayr as follows:
- Insert a new variable (+ > Custom Code > R Numeric).
- Type age - Average(age)
When the R language processes age - Average(age) its first step is to swap out the variable name, age, for the underlying data, which in this case is a variable. More generally, though, this variable is in the language of math and physics, a vector. So, in a data set with 11 cases, age - Average(age) becomes:
R then calculates the Average, so the calculation then reduces to:
Now, R has a problem, which is that it's trying to compare two vectors of different lengths. The way that it does this is it recycles the shorter vector, repeating its results until it's the same length as the longer vector (if the shorter vector cannot be neatly recycled, you get an error). In our example, this just involves repeating 41 eleven times:
R then recognizes that we are computing a difference between two vectors of the same size, so it applies the calculation we are performing to all the matching elements:
Now that the vectors are the same length, R can do the math automatically (this is called vectorized maths). The calculation returns a variable of length 11, and this becomes the variable that appears in the data.
Variables must be created with the correct length
Now, consider a slightly different problem. Let's say for some reason that we wanted to create a variable that contains, for every respondent, the average age. A novice would try and use Average(age) and would get the error Can only convert tabular results to an R variable or question.
The problem is that Displayr is expecting the variable that is returned to it to be of the same length as the data file (11 cases in this example). But, instead, it only gets back a single value.
The fix is to use the rep function, which repeats the data the specified number of times. Most simply we could use rep(Average(age), 11), but better code would be to use rep(Average(age), length(age)), as this will still work even if the input data file is changed to one with more or fewer cases.
For a more general overview of using R in Displayr, see the Displayr Help section on R.