Considerable performance gains can be obtained by migrating calculations into the R CODE section of some data visualizations. This article assumes some basic level of competence in writing R code, and is structured as follows, it:
- Explains the R CODE section of a data visualization.
- Provides a detailed worked example of the strategy.
- Explains how to do this so that the visualization can easily be re-used.
Please note these steps require a Displayr license.
The R CODE section of a data visualization
Some visualizations in Displayr are written in the R language. The code that creates the visualization is visible in Data tab of the object inspector. For example, you can see the code for a pictograph below.
Detailed worked example using the R CODE section
The dependency graph below is for a visualization that has been created in a way that is guaranteed to be slow. The graph shows two separate paths. The one at the top takes a total of .11 + .00 + .02 + .02 + .44 + .45 + .42 + .42 + 1.64 + 1.01 = 4.53 seconds. The one at the bottom takes a little less time, meaning that the overall time taken is, at best, 4.53 seconds for the visualization to calculate. (Where multiple visualizations are being shown simultaneously, the overall time may be slower.)
Explaining each of the nodes
The leftmost node shows that the entire data set takes 0.11 seconds to load.
The second node shows that extracting the data from the data for the Q5 variable set is effectively instant.
The top-most Q5 table contains the percentages of people to associate different brands with different personality attributes. It takes 0.02 seconds to calculate.
The next node selects the first six rows of data for the Older column. It took 0.45 seconds. It's slow because there is an overhead associated with each calculation.
The next node sorted the table and took 0.42 seconds.
The next node, viz, creates a bar chart.
The rightmost node shows the visualization if the sample size is greater than 50 and, otherwise, shows nothing.
The remaining nodes extract the sample sizes for the Older column and calculate the smallest value.
Optimizing the dependency graph
The article now describes optimizing the dependency graph by placing calculations into the R CODE of the visualization.
The first win is replacing the two tables with a single table containing the percentages and the sample size.
Then, we reference this table as the input data to the table:
Then, we modify the code in the object inspector > Data > R CODE. There are three steps to this. The code below is doing the same thing as was previously done in the separate calculations. Some things to note:
- A message box will appear asking Are you sure you want to edit the R code. Click Yes.
- The warning in the first line is to alert anybody who clicks on the visualization that the underlying R CODE has been modified.
- formTable refers to the table that has been selected (i.e., table.Q5).
- viz.2 is the name that the visualization will be assigned. If you use a name that has been previously used, you will get an error and need to select another name.
- "" means that nothing will appear if the sample size is too small. We could also include a message (e.g., "Sample size too small").
- An if statement has been used, and the entire visualization is the else condition.
- Once we have selected and sorted the sub-selection, we assign it to formTable, as this is the object that is ultimately used by the code that creates the visualization.
Due to the use of the if statement, two further modifications are required. First, scroll down to around line 250 (this will change depending on how much code you've added at the beginning, and find the place where the visualization is being named. Here, we can see it's named as viz.2.
We delete the name assignment (i.e., viz.2 <- ), as shown below. We do this because all this code is now nested within the if statement at the top, and the naming now occurs at the top.
Last, we must put a closing brace at the bottom of the code. This is done to close the else condition of the if statement.
Once the above is done, the dependency graph has been shortened, taking less than half the time.
Making the visualization easily re-usable
The detailed worked example uses code in the R CODE box to select rows 1 to 6 of the Older column of the table. If anybody wants to change the selection to a different column, they must modify the R CODE.
An alternative approach is to only select the cells of interest as inputs rather than the whole table:
Then, we modify the R CODE as above, but removing the bits that do the sub-selection:
The visualization can be copied and pasted, with the user modifying the inputs and having it automatically sort and be hidden if the sample size is small.
Next
How to Perform Mathematical Calculations Using R