This article describes how to create a filter for splitting your sample. This feature has multiple use cases, such as splitting the sample for predictive modeling such as regression; for creating a training, validation, and testing sample based on a filter; or for removing a proportion of a sample.
This article explains the required steps to generate each of the below three filter examples.
Requirements
- A Displayr document containing a data set.
Please note this requires the Data Stories module or a Displayr license.
Method - creating a training and testing filter
- Go to the Anything icon menu, and select Filtering > Model Checking > Filters for Train-Test Split.
- When prompted input the percentage of the data sources that should be used as the training set. By default, this is set to 70%. Select OK.
- Displayr will create a Train test split variable usable as a filter.
Method - creating a training, validation, and testing filter
- Go to the Anything menu, and select Filtering > Model Checking > Filters for Train-Validation-Test Split.
- When prompted input the percentage of the data sources that should be used as the training set. By default, this is set to 50%. Select OK.
- In the second prompt, input the percentage of the data sources that should be used as the validation set. By default, this is set to 25%. Select OK.
- Displayr will create a Train validate test split variable usable as a filter.
Method - removing a proportion of a sample
- Go to the Anything icon , and select Filtering > Model Checking > Filters for Train-Test Split.
- When prompted input the percentage of the data sources that you wish to keep. By default, this is set to 70%. Select OK.
- Displayr will create a Train test split variable usable as a filter.
- To remove the respondents from the data set, select the name of your data set from the Data Sources tree.
- Go to the object inspector > General > GENERAL > Unique identifier and select a variable with unique values, or select [Use case number]. CAUTION: Displayr will use the ordering of the data set to identify observations. If your data might change, you should not choose "use case number" unless new observations will always be added to the end of the data set.
- Select any variables in your Data Sources tree that you wish to view as raw data, and right-click > View in Data Editor.
- Select your filter variable in the Filter drop-down so that all the rows selected by the filter will appear in green. Here, we have applied our Testing Sample filter while showing our Location data:
- Now right-click the row header of any row matching the filter and select Delete Row(s) Matching Filter to delete these cases from your data set.
- OPTIONAL: You can return deleted cases to your data set by going back to the Data Editor and right-clicking any row header > Undelete All Rows or clicking on the data set name in the Data Sources tree and clicking Restore deleted cases from the object inspector.
See Also
How to Remove Cases From Raw Data Using a Filter
How to Tag a Variable as a Filter