This article describes how to automatically create a new filter variable that identifies cases in a data set with duplicated values in one or more variables. This feature is useful for identifying duplicate cases to delete from the dataset. This variable can also be easily modified to identify unique cases and to serve as a filter on outputs in your Report. For example, if you have a data set with multiple records for a respondent, you may want to only include a single record for each respondent in certain outputs in your Report. This article explains how to use the Data Quality > Duplicates feature to remove duplicate data from a dataset and create a unique-case filter for outputs.
Requirements
- A data set in a Displayr document.
Method
- Select one or more variables in the Data Sources tree.
- Click + > Data Quality > Duplicates.
- A new variable called Duplicates will be added to the Data Sources tree.
- Drag the variable Duplicates onto the page to review counts.
- OPTIONAL: You can then use this variable as a filter to remove duplicate cases from the data set. See How to Delete Cases From a Data Set for details.
- OPTIONAL: You can convert this variable into a variable to filter in unique cases on outputs and analyses. In the object inspector
Click Data > Properties > Values and change the values so that Yes is 0 and No is Check "Usable as a filter," and you will now be able to filter for unique cases in outputs.
Next
How to Remove Duplicate Cases From a Data Set