This article describes how to perform looping using R.
Requirements
- An R variable, calculation, or data set.
The below examples focus on Calculations based on a table. Performing these calculations on a variable set instead will work the same way, except the functions will look at the underlying values rather than the aggregated results.
Method
1. No loop
Of course, there are many different ways to write R code so it's no surprise that something that may appear to require a loop, doesn't actually. Especially for things like basic arithmetic, R will automatically do the math on the corresponding elements across the variables included. For example to subtract the second from the first column in a table:
t = Preferred.cola.by.Gender
difference = t[,1] - t[,2]
So it's best to try first! See How to Perform Mathematical Calculations Using R for more on this.
2. Apply
If the function you want to use doesn't accept multiple values at once, a more efficient alternative to a for... loop is to use the apply function. The format is:
apply(data, rows or columns, function)
The rows or columns argument requires as 1 for rows and 2 for columns. The function argument also allows for custom functions. The equivalent code for the same example is as follows:
t = data.frame(Preferred.cola.by.Gender)
difference = apply(t, 1, function(x) x[1]-x[2])
Note that there isn't a need for any commas in the brackets after x above because x is going to be a row of the data so you just need the number of the item in the row in brackets.
3. For... loop
For loops are usually not required in R as most functions automatically can be run across multiple values in one line of code. However, sometimes it is necessary when you need to loop through the items in two lists or if you are performing calculations over the same data with different inputs. The general structure of for loops in R are like so:
for (value in sequence)
{
condition
}
Let's look at a very basic example to illustrate the process. We have the following table and want to calculate the difference between columns.
Here, we set our table as a data frame and calculate the column difference by looping through each row:
t = data.frame(Preferred.cola.by.Gender)
for (i in rownames(t)) {
difference[i] = t[i,1] - t[i,2]
}
difference
Each iteration of the loop is incremented using i for each of the row names in our table. Instead of using rownames(t)
, we could also use any of the below:
for (i in seq(NROW(t)))
for (i in 1:NROW(t))
for (i in 1:8)
4. A better loop example
We have a table below of preferred colas over a series of months:
What we want to do is create a rolling 4-month average, so we can use the for... loop approach here:
t = Preferred.cola.by.Months
# Create empty matrix (excluding first 3 columns) and assign row and column labels
rolling = matrix(0, NROW(t), NCOL(t)-3)
rownames(rolling) = rownames(t)
colnames(rolling) = colnames(t)[-1:-3]
# Create rolling 4-month average
for (c in 4:NCOL(t)) {
avg = rowMeans(t[,(c-3):c], na.rm = T)
rolling[,(c-3)] = avg
}
rolling
- We begin by creating a matrix table called rolling to store the rolling averages using the matrix function.
- We apply the row and column labels to this table but remove the first 3 columns as these will disappear due to the rolling period.
- We now loop through each column starting from the fourth position.
- We then calculate the average across the current and the previous 3 columns.
- Finally, we add the row averages to our rolling table by offsetting by 3.