Custom Parallel Computation of Long-Running Calculations – Displayr Help

Parallel computing involves breaking down a large problem into smaller, independent problems, where the smaller problems can all be solved at the same time. It's a very common way of optimizing code and is widely used in Displayr's "backend". A simple worked example explains how you can use this technique to optimize a document. However, usually, custom parallel computations will make a document slower, so make sure you only try it as a genuine last resort.

Simple worked example

Let's say you want to compute the average of 100 million numbers, randomly drawn from a normal distribution with a mean of 0 and a standard deviation of 10. R code for doing this is:

Average(runif(100000000))

On the day this article was written, this took 9.8 seconds. We can do the same calculation by instead using:

Average(Average(runif(50000000)),Average(runif(50000000)))

The code above took more than 19 seconds to compute. This may be surprising, but it is because the first version allowed a lot of in-built parallelization and optimization to be performed. Note that this is another example of the two more general ways of improving documents:

Avoid writing your own R code.
Avoid performing multiple calculations.

However, we can make this run even faster by instead creating three completely separate calculations:

calculation.1 = Average(runif(50000000))

and

calculation.2 = Average(runif(50000000))

and lastly:

Average(calculation.1,calculation.2)

When we look at the dependency graph, we can see that it is probably the fastest:

A few things to note about this:

Provided that calculation.1 and calculation.2 are run in parallel, then this calculation will now take 0.44 + 6.94 = 7.38 seconds to run. We can ignore the 6.12 seconds for calculation.2, as this can be run while calculation.2 is being run.
If we break the two calculations up again (e.g., into 10 calculations rather than 2), we theoretically make the calculation much faster. (But, read the next section before trying this.)
The trivial calculation that just averages the first two calculations is still not taking 0 seconds to compute.

Usually, custom parallel computations will make a document slower

Using custom parallel computations should always be very much a last resort, as there's no guarantee that the calculations you want to run in parallel will actually be run in parallel. For example, if you have a document of 100 calculations, and they all need to update, then the finite capacity of a computer means that all the computer's processors will be busy: calculation.1 and calculation.2 may not be run in parallel. In the example above, that would mean the time taken becomes 6.94 + 6.12 + 0.44 = 13.4 seconds, which is much slower than if just having a single calculation. Furthermore, there may be some other calculation that needs to occur between calculation.1 and calculation.2, or before the final calculation, further slowing things down.

In practical terms, the only situation where custom parallelization is likely sensible is when the calculations are known to be very slow (e.g., taking 10s of seconds or minutes).

Articles in this section

Simple worked example

Usually, custom parallel computations will make a document slower

Related articles