Most data cleaning of surveys analyzed in Displayr is performed in one or more of:
- The data collection program
- A text editor
- Excel
- SPSS
- Displayr
If working with Displayr, it is generally most efficient to only perform the cleaning in the data collection program and/or Displayr. That is, in the vast majority of instances, it is inadvisable to perform any cleaning in a text editor, Excel or SPSS. Where the data collection program has no internal tools for data cleaning, it is generally best to do all the data cleaning in Displayr.
Note that this issue is entirely about efficiency and quality control. From a technical perspective, there is no reason that you cannot perform the data cleaning in a text editor, Excel or SPSS.
Problems with performing data cleaning in text editors, Excel and SPSS
- It is time consuming. Data cleaning operations performed in text editors, Excel and SPSS, are performed manually. Even if using syntax or macros, the user still needs to manually modify the syntax/macros for specific projects.
- Difficulty of repeating. Where a multiple data files need to be extract from the same project, the time consuming processes need to be repeated. Or, users need to take the time to create syntax and macros and review these to address any modifications in the data collection processes.
- Voluntary documentation. Any documentation that exists needs to be manually created by whoever is performing the data cleaning. If the person performing the cleaning is in a rush, lazy, or error prone, there will be inadequate documentation.
- Lack of transparency. When cleaning the data in text editors, Excel and SPSS, changes are made in the actual data itself, and it is impossible for whoever is using the data to review what has been done, without going back to the original data.
The net effect of all of these is that the data cleaning process is inevitably either error prone or very time consuming.
Benefits of performing data cleaning in Displayr
- When importing data files into Displayr, Displayr automatically examines the data file, attempts to identify the data collection program used to create the file and automatically performs various rudimentary data cleaning tasks known to be applicable to that data collection program (e.g., fixing labels, identify question types, fixing missing values problems peculiar to specific data collection programs).
- Additional automations have been developed specifically for data cleaning purposes. For example, there are automations for identifying and removing outliers, creating tables showing don't knows, reversing scales, capping, and identifying flat-lining. See Check Your Data.
- With an Enterprise license you can create your own QScript automations. You can tailor them specifically to your needs, either from scratch or by modifying an existing QScript.
- When updating data files, all data cleaning will automatically be reapplied to both existing and new respondents. See How to Update with New or Revised Data.
- As all changes are stored within Displayr, you can always audit all the data cleaning and return any data to its original state. See How to Review Data in Tables and Variables.
- In situations where you do not want users to review the cleaning, you can instead adopt a two step process which retains all the other benefits except for ease of auditing, see Create a Separate Data Preparation Document for more detail. Generally, this works as follows:
- You create one Displayr Document and do all the cleaning in it.
- You create a new cleaned SPSS data file, which you provide to the end-users or export to your Displayr Cloud Drive to use in a different document.
Next