Reducing the number of variables and variable sets in a document can improve performance. This is the case with particularly large data sets (e.g., 100,000+ cases, or 5,000+ variables). This articles describes:
- The benefits of reducing the number of variables
- The benefits of reducing the number of variable sets
- Strategies for reducing the number of variables and variable sets
The benefits of reducing the number of variables
There are various reasons why having unnecessary variables in a document slows things down:
- More data has to be loaded when the document is started.
- More data has to be moved around (see Minimize the Size and Distance of Data Being Moved).
- Unnecessary calculations end up being performed (see Reduce the Size of Variable Sets Used to Create Tables).
- Displayr has to keep track of them. For example, if there are 50,000 variables and you are only using 100 of them, Displayr still needs to keep track of all 50,000, and this uses up resources.
- It causes dropboxes to take time to populate. For example, if you have a dropbox that shows variables, and it has to show 50,000 variables, then these 50,000 labels need to be extracted and moved to the dropbox, which takes time.
- It causes the variable sets tree to slow down. If the tree needs to show all 50,000 variables and keep track of them, this takes longer than if it didn't.
The benefits of reducing the number of variable sets
When the number of variables is held constant, reducing the number of variable sets is still beneficial as the variable sets tree and dropboxes often are displaying variable sets, so the fewer the better.
Strategies for reducing the number of variables and variable sets
- Obtain a new data file with fewer variables.
- Hide variables. This removes them from dropboxes.
- Hide variable sets. This removes them from dropboxes.
- Combine unnecessary variables into larger variable sets, which has the extra benefit of making it easier to navigate around a data set.