There are occasions when you have collected more records than necessary for a survey and you want to randomly remove the surplus or you simply want to select a random subset of records to do something with. This article describes how to select a random sample of respondents in your data set based on a variable by creating an R filter that can be used for deletion.
- A Document with a data set.
- A variable that can be used to filter the selection. In this example, we have a variable called Males 25-29.
1. In the toolbar, go to Anything > Data > Variables > New > Custom Code > R - Numeric.
2. Paste the below under Properties > R CODE in the object inspector:
f = `Males 25-29` # Filter variable
set.seed(123) # Set randomization
sample = length(f) # Total sample size
remove = 10 # Number of records to remove
indices = sample.int(sample[f], remove) # Index numbers to remove
filter = rep(0, sample)
filter[indices] = 1
- In this code, we begin by referencing the Label of our group filter.
- We then set the randomization so it will remain identical each time it is run.
- Next, we define the sample size and number of records to remove.
- Finally, we return a TRUE/FALSE based on the indices of the records we identified for removal.
3. Name your new variable by going to Properties > GENERAL and editing the name to Random sample.
4. Tick Usable as a filter and Hidden except in the data tree in the object inspector.
5. Select the name of your data set from the Data Sets tree.
6. Go to the object inspector > Properties > GENERAL > Unique identifier and select your ID variable (if unique) or else [Use case number].
7. Select any variables in your Data Sets tree that you wish to view as raw data, and right-click > View in Data Editor.
8. Select your filter variable in the Data Editor's Filter drop-down so that all the rows selected by the filter will appear in green.
9. Now click the row header > Delete Row(s) Matching Filter to delete these cases from your data set.