This article describes how to create a filter for splitting your sample. This feature has multiple use cases, such as splitting the sample for predictive modeling, such as regression; for creating a training, validation, and testing sample based on a filter; or for removing a proportion of a sample.
This article explains the required steps to generate each of the three filter examples below.
Requirements
- A Displayr document containing a data set.
Method - Creating a training and testing filter
- In the Data Sources tree, hover and click + > Filter > Train-Test Samples.
- When prompted, input the percentage of the data that should be used as the training set. By default, this is set to 70%. Select OK.
- Displayr will create a Train test split variable usable as a filter.
Method - Creating a training, validation, and testing filter
- In the Data Sources tree, hover and click + > Filter > Train-Validation-Test Samples.
- When prompted, input the percentage of the data that should be used as the training set. By default, this is set to 50%. Select OK.
- In the second prompt, input the percentage of the data sources that should be used as the validation set. By default, this is set to 25%. Select OK.
- Displayr will create a Train validate test split variable usable as a filter.
Method - Removing a proportion of a sample
- Follow the steps in Method - Creating a training and testing filter above.
- To remove the respondents from the data set, select the name of your data set from the Data Sources tree.
- In the object inspector, go to the object inspector
> General > Unique identifier and select a variable with unique values, or select [Use case number]. CAUTION: Displayr will use the ordering of the data set to identify observations. If your data might change, you should not choose "use case number" unless new observations will always be added to the end of the data set.
- Select any variable(s) in your Data Sources tree that you wish to view as raw data, right-click, and select View in Data Editor.
- Select your filter variable in the Filter drop-down so that all the rows selected by the filter will appear in green. Here, we have applied our Training Sample filter while showing our gender and age data:
- Right-click the row header of any row matching the filter and select Delete Row(s) Matching Filter to delete these cases from your data set.
- OPTIONAL: You can return deleted cases to your data set by going back to the Data Editor and right-clicking any row header > Undelete All Rows, or clicking on the data set name in the Data Sources tree and clicking Restore deleted cases from the object inspector
.
Next
How to Remove Cases From Raw Data Using a Filter
How to Create Filters Using Variables in Your Data