Considerable performance gains can be obtained by migrating calculations into the R code section of some data visualizations. This article assumes some basic level of competence in writing R code, and is structured as follows, it:
- Explains the R code section of a data visualization.
- Provides a detailed worked example of the strategy.
- Explains how to do this so that the visualization can easily be reused.
The R Code section of a data visualization
Some visualizations in Displayr are written in the R language. The code that creates the visualization is visible in the Code section of Displayr. For example, you can see the code for a pictograph below.
Detailed worked example using the R Code section
The Inputs & Outputs below are for a visualization that has been created in a way that is guaranteed to be slow. The graph shows two separate paths. The one at the top takes a total of .11 + .00 + .02 + .02 + .44 + .45 + .42 + .42 + 1.64 + 1.01 = 4.53 seconds. The one at the bottom takes slightly less time, meaning the overall time is at best 4.53 seconds for the visualization to calculate. (Where multiple visualizations are being shown simultaneously, the overall time may be slower.)
Explaining each of the nodes
The leftmost node shows that the entire data set takes 0.11 seconds to load.
The second node shows that extracting data for the Q5 variable set is instantaneous and requires no time.
The topmost Q5 table shows the percentages of people who associate different brands with various personality attributes. It takes 0.02 seconds to calculate.
The next node selects the first six rows of data for the Older column. It took 0.45 seconds. It's slow because there is an overhead associated with each calculation.
The next node sorted the table, taking 0.42 seconds.
The next node, viz, creates a bar chart.
The rightmost node shows a visualization if the sample size is greater than 50; otherwise, it shows nothing.
The remaining nodes extract the sample sizes from the Older column and calculate the smallest value.
Optimizing the Inputs & Outputs
The article now describes optimizing the Inputs & Outputs by placing calculations into the R Code of the visualization.
The first win is replacing the two tables with a single table containing the percentages and the sample size.
Then, we reference this table as the input data to the table:
Then, we modify the code in the Code panel. There are three steps to this. The code below does the same thing as was done in the separate calculations. Some things to note:
- A message box will appear asking Are you sure you want to edit the R code. Click Show.
- The warning in the first line is to alert anybody who clicks on the visualization that the underlying R Code has been modified.
- formTable refers to the table that has been selected (i.e., table.Q5).
- viz.2 is the name that the visualization will be assigned. If you use a name that has been previously used, you will get an error and need to select another name.
- "" means that nothing will appear if the sample size is too small. We could also include a message (e.g., "Sample size too small").
- An if statement has been used, and the entire visualization is the else condition.
- Once we have selected and sorted the sub-selection, we assign it to formTable, since this is the object ultimately used by the code that creates the visualization.
Due to the use of the if statement, two further modifications are required. First, scroll down to around line 250 (this will change depending on how much code you've added at the beginning, and find the place where the visualization is being named. Here, we can see it's named as viz.2.
We delete the name assignment (i.e., viz.2 <- ), as shown below. We do this because all the code is now nested within the if statement at the top, and naming now occurs there.
Lastly, we must add a closing brace at the end of the code. This is done to close the else condition of the if statement.
Once the above is done, the Inputs & Outputs have been shortened, taking less than half the time.
Making the visualization easily re-usable
The detailed worked example uses code in the R code box to select rows 1 to 6 of the Older column of the table. If anybody wants to change the selection to a different column, they must modify the R Code.
An alternative approach is to only select the cells of interest as inputs rather than the whole table:
Then, we modify the R CODE as above, but removing the bits that do the sub-selection:
The visualization can be copied and pasted, with the user modifying the inputs, and it automatically sorts and hides if the sample size is small.