This article describes how to use a built-in QScript to check the selected numeric data for outliers and creates new copies of the data with the outliers removed. Outliers are defined as values that are not within a certain number of standard deviations from the variable mean. You can choose how many standard deviations are used to determine which values are considered to be outliers. The default value is 3 standard deviations. The new copies of data will have the outlier values replaced with missing values. Data that does not contain outliers will not be copied.
Requirements
- A data file loaded in Displayr
- One or more numeric variables
Please note these steps require a Displayr license.
Method
To run the script:
- Select a numeric, numeric - multi, or numeric - grid in the Data Sources tree.
- Click + > Ready-Made New Variables > Variables(s) with Outliers Removed or from the object inspector, click TRANSFORMATIONS > Variable(s) with Outliers Removed.
- Enter a cut-off value to identify cases whose standard deviations are not within that value. The default value is 3.
A folder will be created in the Pages tree that contains tables for the selected data and any new copies of data with the outliers removed.
The new copies of variables use a JavaScript formula to assign respondents with outlying values with a new value of NaN. The means and standard deviations are determined when this script is run. As a result, the definition of an outlier in variables where the outliers have been removed will not be updated if the underlying data is changed or updated.
Next
How to Use Scripts to Automate Data Checking and Cleaning
How to Check for Errors in Data File Construction
How to Identify Questions with Straight-Lining/Flat-Lining
How to Hide Uninteresting Data
How to Remove Truncated Text from Variable Labels
How to Reverse Scales in Questions