When going through your data cleaning process, you can visualize missing data across variables to identify variables to look at closer for special consideration. One way is to visualize missing data by pattern. Based on the pattern of missing data, you might be able to make inferences about the missing data between the variables. This article describes how to create a chart showing the patterns of missing data, with blue shading indicating missing values.
A Displayr document with a data set.
- Go to Anything > Data > Missing Data > Plot of Patterns.
- In the object inspector, under Inputs > Variables select the variables you want to analyze, change any other settings, and click Calculate to run the function.
OPTIONAL: The following settings can be updated to modify the output:
- Variable names - Displays Variable Names in the output instead of labels.
- Filter - The data is automatically filtered using any filters prior to estimating the model.
As the number of variables increases, the number of possible patterns explodes, making this chart very difficult to read. The most straightforward way to address this is to limit the number of variables that you chart. Alternatively, the font sizes are automatically reduced to assist in this problem, but a lot of zooming can be required. The font size can be manually set in the code by modifying the value of cex.numbers in Properties > R CODE, where a 1 indicates a normal-sized font, and smaller values indicate smaller font sizes.