This article describes how to automatically create a new filter variable that identifies the cases in a data set that have duplicated values in one or more variables. This feature is useful when you want to identify duplicate cases to delete from the data set. This variable is also easily modified to identify unique cases instead, and to use as a filter on outputs in your Report. For example, if you have a data set with multiple records for a respondent, you may want to only include a single record for each respondent in certain outputs in your Report. This article explains how to use the Data Quality > Duplicates feature to remove duplicated data from a data set and create a unique case filter to use on outputs.
Requirements
- A data set loaded into a Displayr document.
Method
- Select one or more variables in the Data Sources tree.
- Click the variable hover button
to the right of the variable and then select Data Quality > Duplicates.
- A new variable called Duplicates will be added to the Data Sources tree.
- Drag the variable Duplicates onto the page to review counts.
- OPTIONAL: You can then use this variable as a filter to remove duplicate cases from the data set. See How to Delete Cases From a Data Set for details.
- OPTIONAL: You can convert this variable into a variable to filter in unique cases on outputs and analyses. In its object inspector click Data > Properties > Values and change the values so Yes is 0 and No is 1. Check Usable as a filter and you will now be able to filter in unique cases on outputs.
Next
How to Remove Duplicate Cases From a Data Set