There are occasions when you have collected more records than necessary for a survey, and you want to randomly remove the surplus, or you simply want to select a random subset of records to do something with. This article explains how to create a filter for a random sample of respondents in your dataset. You can then use this filter to refine tables or analyses and remove cases from the data, if needed.
Requirements
- A Document with a data set.
- For the second Method, a variable that can be used to filter the selection to the subgroup to sample. In this example, we have a variable called Males 25-29.
Method - Random filter across all respondents
1. Hover over any variable in your Data Sources tree and click Plus (+) > Custom Code > R > Numeric.
2. In the R Code editor, paste the code below:
##code to modify
#specify any variable in the dataset (this is used to calculate how many respondents are in the data)
id = UniqueID
#specify number of respondents to randomly select
select = 10
##standard code
#set the seed so randomization doesn't change if calculated again later
set.seed(123)
#calculate total sample size
ss = length(id)
#select the random respondents/rows in the data
indices = sample.int(ss, select)
#create an empty filter
filter = rep(0, ss)
#change the random selection values in the filter to 1
filter[indices] = 1
#return the final filter
filter3. Name your new variable by going to Object Inspector > General > General > Name and editing the name to Random sample.
4. Tick Usable as a filter and Hidden (except in variables and code) under Data > Properties in the Object Inspector .
5. [OPTIONAL]: Use this filter to delete those random selections. See How to Remove Cases From Raw Data Using a Filter.
6. [OPTIONAL]: Use this filter to filter in only that random selection into your table or analysis by using the Filters & Weight > Filter(s) dropdown.
7. [OPTIONAL]: If you want to create a filter to filter out the random selection from a table or analysis, change lines 14-17 to:
#create a filter including everyone
filter = rep(1, ss)
#change the random selection values in the filter to 0 to filter out
filter[indices] = 0Method - Random filter across a subgroup of respondents
1. Hover over any variable in your Data Sources tree and click Plus (+) > Custom Code > R > Numeric.
2. In the R Code editor, paste the code below:
##code to modify
#specify a filter variable of your subgroup
subgroup = `Males 25-29`
#specify label of those selected in the subgroup variable
selected = "Selected"
#specify number of respondents to randomly select
select = 10
##standard code
#set the seed so randomization doesn't change if calculated again later
set.seed(123)
#get the list of rows of the subgroup in the data
subgroup_rows = which(subgroup == selected)
#select the random respondents/rows from those rows
indices = sample(subgroup_rows, select)
#create an empty filter
filter = rep(0, length(subgroup))
#change the random selection values in the filter to 1
filter[indices] = 1
#return the final filter
filter3. Name your new variable by going to Object Inspector > General > General > Name and editing the name to Random sample of Males 25-29.
4. Tick Usable as a filter and Hidden (except in variables and code) from Data > Properties in the Object Inspector.
5. [OPTIONAL]: Use this filter to delete those random selections see How to Remove Cases From Raw Data Using a Filter.
6. [OPTIONAL]: Use this filter to filter in only that random selection into your table or analysis by using the Filters & Weight > Filter(s) dropdown.
7. [OPTIONAL]: If you want to create the filter to filter out the random selection from a table or analysis change lines 16-19 to:
#create a filter including everyone
filter = rep(1, length(subgroup))
#change the random selection values in the filter to 0 to filter out
filter[indices] = 0Next
How to Remove Duplicate Cases From a Data Set
How to De-duplicate Raw Data Using R
UPCOMING WEBINAR: The Roadmap for Market Researchers in the Age of AI