Parallel computing involves breaking down a large problem into smaller, independent problems, where the smaller problems can all be solved at the same time. It's a very common way of optimizing code and is widely used in Displayr's "backend". A simple worked example explains how you can use this technique to optimize a document. However, usually, custom parallel computations will make a document slower, so make sure you only try it as a genuine last resort.

## Simple worked example

Let's say you want to compute the average of 100 million numbers, randomly drawn from a normal distribution with a mean of 0 and a standard deviation of 10. R code for doing this is:

Average(runif(100000000))

Average(Average(runif(50000000)),Average(runif(50000000)))

- Avoid writing your own R code.
- Avoid performing multiple calculations.

However, we can make this run even faster, by instead creating three completely separate calculations:

calculation.1 = Average(runif(50000000))

calculation.2 = Average(runif(50000000))

`Average(calculation.1,calculation.2)`

- Provided that
*calculation.1*and*calculation.2*are run in parallel, then this calculation will now take 0.44 + 6.94 = 7.38 seconds to run. We can ignore the 6.12 seconds for calculation.2, as this can be run while calculation.2 is being run. - If we break the two calculations up again (e.g., into 10 calculations rather than 2) we theoretically make the calculation much faster. (But, read the next section before trying this).
- The trivial calculation that just averages the first two calculations is still not taking 0 seconds to compute.

## Usually, custom parallel computations will make a document slower

Using custom parallel computations should always be very much a last resort, as there's no guarantee that the calculations you want to run in parallel will actually be run in parallel. For example, if you have a document of 100 calculations, and they all need to update, then the finite capacity of a computer means that all the computers' processors will be busy, meaning that *calculation.1* and *calculation.2* may not be run in parallel. In the example above, that would mean the time taken becomes 6.94 + 6.12 + 0.44 = 13.4 seconds, which is much slower than if just having a single calculation. Furthermore, there may be some other calculation that needs to occur between calculation.1 and calculation.2, or before the final calculation, further slowing things down.

In practical terms, the only situation where custom paralyzation is likely sensible is when the calculations are known to be very slow (e.g., taking 10s of seconds or minutes).

## Comments

0 comments

Please sign in to leave a comment.