Most data cleaning of surveys analyzed in Displayr is performed in one or more of:
- The data collection program
- A text editor
- Excel
- SPSS
- Displayr
If working with Displayr, it is generally most efficient to only perform the cleaning in the data collection program and/or Displayr. That is, in the vast majority of instances, it is inadvisable to perform any cleaning in a text editor, Excel, or SPSS. Where the data collection program has no internal tools for data cleaning, it is generally best to do all the data cleaning in Displayr.
Note that this issue is entirely about efficiency and quality control. From a technical perspective, there is no reason that you cannot perform the data cleaning in a text editor, Excel, or SPSS.
Problems with performing data cleaning in text editors, Exce,l and SPSS
- It is time-consuming. Data cleaning operations performed in text editors, Excel, and SPSS are performed manually. Even if using syntax or macros, the user still needs to manually modify the syntax/macros for specific projects.
- Difficulty of repeating. Where multiple data files need to be extracted from the same project, the time-consuming processes need to be repeated. Or, users need to take the time to create syntax and macros and review these to address any modifications in the data collection processes.
- Voluntary documentation. Any documentation that exists needs to be manually created by whoever is performing the data cleaning. If the person performing the cleaning is in a rush, lazy, or error-prone, there will be inadequate documentation.
- Lack of transparency. When cleaning the data in text editors, Excel, and SPSS, changes are made in the actual data itself, and it is impossible for whoever is using the data to review what has been done, without going back to the original data.
The net effect of all of these is that the data cleaning process is inevitably either error-prone or very time-consuming.
Benefits of performing data cleaning in Displayr
- When importing data files into Displayr, Displayr automatically examines the data file, attempts to identify the data collection program used to create the file, and automatically performs various rudimentary data cleaning tasks known to be applicable to that data collection program (e.g., fixing labels, identifying question types, fixing missing values problems peculiar to specific data collection programs).
- Additional automations have been developed specifically for data cleaning purposes. For example, there are automations for identifying and removing outliers, creating tables showing don't knows, reversing scales, capping, and identifying flat-lining. See Check Your Data.
- With an Enterprise license, you can create your own QScript automations. You can tailor them specifically to your needs, either from scratch or by modifying an existing QScript.
- When updating data files, all data cleaning will automatically be reapplied to both existing and new respondents. See How to Update with New or Revised Data.
- As all changes are stored within Displayr, you can always audit all the data cleaning and return any data to its original state. See How to Review Data in Tables and Variables.
- In situations where you do not want users to review the cleaning, you can instead adopt a two-step process which retains all the other benefits except for ease of auditing. See Create a Separate Data Preparation Document for more details. Generally, this works as follows:
- You create one Displayr Document and do all the cleaning in it.
- You create a new cleaned SPSS data file, which you provide to the end-users or export to your Displayr Cloud Drive to use in a different document.
UPCOMING WEBINAR: 10 Market Research Predictions Over the Next 4 Years