This article describes how to impute missing data and add new imputed variables to your data set.
A popular approach to dealing with missing data is to use a technique called imputation, which seeks to guess the value of the missing data.
Click here for more information on when to use imputation.
Requirements
A data set loaded into Displayr with at least one instance of missing data.
Please note these steps require a Displayr license.
Method
- Select the variable(s) with missing data that you would like to impute in the Data Sources tree.
- Hover over the variable until you can click the plus sign.
- Click the plus sign > Ready-Made New Variables > Impute Missing Data
This will add an imputed variable for each of the variables selected in Step 1 containing "imputed" in the Name and Question.
To change how the imputation is performed, select the new variable or a variable within the new variable set and update any of the following settings from the object inspector:
- OPTIONAL: Auxiliary variables: You can add additional variables to this drop-box to use the data from those variables in the imputation. These variables' data are used to inform the imputation, but are not themselves added to the data set.
- OPTIONAL: Seed: This is the random number seed used in the imputation. Changing this number will result in a different solution.
- OPTIONAL: Method:
-
Try mice: The imputation will initially try to use the mice (Multivariate Imputation by Chained Equations in R) algorithm, and if this is not successful it will attempt to use the hotdeck algorithm.
-
Hot Deck: Force the imputation to only use the hotdeck algorithm.
-
Mice: Force the imputation to only use the mice algorithm.
-
- Click Calculate.
Technical Details
By default, data is imputed using the default settings from the mice R package, which employs Multivariate Imputation by Chained Equations (predictive mean matching) [1]. Care should be taken to ensure that variables have the correct variable type, as this has a big impact on this algorithm. Where a technical error is experienced using mice, the imputation is performed using hot-decking, via the hot.deck package in R.[2]
When applied with regression, missing values in the outcome variable are excluded from the analysis after the imputation has been performed.[3]
Note that although imputation can reduce the bias of parameter estimates, it can create misleading statistical inference (e.g., as the simulated sample size is assumed to be the actual sample size in calculations).
The new Variables are imputed jointly. This means that if you make changes to one of them then the others will also change.
There are some technical limitations with regards to how you can change the new variables:
- You cannot add or remove variables from the Variables drop-box.
- You cannot change the order of variables in the Variables drop-box.
- If you wish to delete any of the imputed variables you must delete them all together because they are linked.
1. Stef van Buuren and Karin Groothuis-Oudshoorn (2011), "mice: Multivariate Imputation by Chained Equations in R", Journal of Statistical Software, 45:3, 1-67.
2. Skyler J. Cranmer and Jeff Gill (2013). We Have to Be Discrete About This: A Non-Parametric Imputation Technique for Missing Categorical Data. British Journal of Political Science, 43, pp 425-449.
3. von Hippel, Paul T. 2007. "Regression With Missing Y's: An Improved Strategy for Analyzing Multiply Imputed Data."
See Also
How to Turn Off Missing Data Selection for Specific Values
How To Check for Missing Data Using Plot by Case
How To Check for Missing Data Using Plot of Patterns
How to Check Missing Data Using Little's MCAR Test
How to Create a Filter for Complete Cases