There are occasions when you have collected more records than necessary for a survey and you want to randomly remove the surplus or you simply want to select a random subset of records to do something with. This article describes how to create a filter for a random sample of respondents in your data set. You can then use this filter to filter tables or analysis and remove cases from the data if needed.
Requirements
- A Document with a data set.
- For the second Method - a variable that can be used to filter the selection to the sub-group to sample. In this example, we have a variable called Males 25-29.
Method - Random filter across all respondents
1. Hover over any variable in your Data Sources tree and click Plus (+) > Custom Code > R - Numeric.
2. In the R Code editor, paste the code below:
##code to modify
#specify any variable in the dataset (this used to claculate how many respondents are in the data)
id = UniqueID
#specify number of respondents to randomly select
select = 10
##standard code
#set the seed so randomization doesn't change if calculated again later
set.seed(123)
#calculate total sample size
ss = length(id)
#select the random respondents/rows in the data
indices = sample.int(ss, select)
#create an empty filter
filter = rep(0, ss)
#change the random selection values in the filter to 1
filter[indices] = 1
#return the final filter
filter
3. Name your new variable by going to General > General > Name and editing the name to Random sample.
4. Tick Usable as a filter and Hidden (except in variables and code) under Data > Properties in the object inspector.
5. [OPTIONAL]: Use this filter to delete those random selections see How to Remove Cases From Raw Data Using a Filter.
6. [OPTIONAL]: Use this filter to filter in only that random selection into your table or analysis by using the Filters & Weight > Filter(s) dropdown.
7. [OPTIONAL]: If you want to create the filter to filter out the random selection from a table or analysis change lines 14-17 to:
#create a filter including everyone
filter = rep(1, ss)
#change the random selection values in the filter to 0 to filter out
filter[indices] = 0
Method - Random filter across a subgroup of respondents
1. Hover over any variable in your Data Sources tree and click Plus (+) > Custom Code > R - Numeric.
2. In the R Code editor, paste the code below:
##code to modify
#specify a filter variable of your subgroup
subgroup = `Males 25-29`
#specify label of those selected in subgroup variable
selected = "Selected"
#specify number of respondents to randomly select
select = 10
##standard code
#set the seed so randomization doesn't change if calculated again later
set.seed(123)
#get the list of rows of the subgroup in the data
subgroup_rows = which(subgroup == selected)
#select the random respondents/rows from those rows
indices = sample(subgroup_rows, select)
#create an empty filter
filter = rep(0, length(subgroup))
#change the random selection values in the filter to 1
filter[indices] = 1
#return the final filter
filter
3. Name your new variable by going to General > General > Name and editing the name to Random sample of Males 25-29.
4. Tick Usable as a filter and Hidden (except in variables and code) from Data > Properties in the object inspector.
5. [OPTIONAL]: Use this filter to delete those random selections see How to Remove Cases From Raw Data Using a Filter.
6. [OPTIONAL]: Use this filter to filter in only that random selection into your table or analysis by using the Filters & Weight > Filter(s) dropdown.
7. [OPTIONAL]: If you want to create the filter to filter out the random selection from a table or analysis change lines 16-19 to:
#create a filter including everyone
filter = rep(1, length(subgroup))
#change the random selection values in the filter to 0 to filter out
filter[indices] = 0
See Also
How to Remove Duplicate Cases From a Data Set
How to De-duplicate Raw Data Using R