Once you have gotten your data into Displayr, it is necessary to check that the data is all as you expect. This is typically something of a never-ending process, but the following basic checks are the minimum:
- Sample size
- Checking that the file only contains completed interviews
- Grouping of variables into variable sets
Sample size
To review the sample size of a data file, click the data set in the Data Sources tree, and the sample size is shown as the Number of cases in the object inspector. In the example below, the sample size is 300.
Complete interviews
Sometimes data files contain incomplete cases where a respondent stopped the survey or interview before finishing. In general, it is typical practice to remove such respondents prior to performing any analysis. Sometimes data files will contain a variable that indicates which interviews were complete. In other situations, a way to check is to create a table using the data from the last question in the study that was meant to be asked of everybody.
To create a table of the last question in the study in Displayr, scroll down to the bottom of the Data Sources tree and drag the variable onto the page. The sample size will be shown in the footer at the bottom of the table.
If respondents don't all have the same last question, you can create a variable that checks to see if any of the last questions were answered. You just need to select the variables and use the built-in Calculation > Any of > Any of feature, see How to Perform Mathematical Calculations on Variables. If you only want to include respondents who completed a specific set of questions, you can look for complete cases over those variables, see How to Create a Filter for Complete Cases. Displayr has further visualizations to help you understand where people are not answering questions in your survey. How To Check for Missing Data Using Plot by Case will show you where respondents are dropping out of the survey, and How To Check for Missing Data Using Plot of Patterns will show you the number of people who didn't answer groups of questions so you can better understand how respondents answered the survey.
Correct grouping of variables into variable sets
The underlying structure of a data set is a large table, where each row represents the data for each person to complete the survey, and each column represents some property of the people. These columns are commonly referred to as variables.
Displayr automatically groups variables into variable sets of related variables. You can also use Displayr AI to further analyze the labels and group and name variable sets in a smarter, more sensible way. Often variable sets will contain only a single variable. However, they can contain multiple variables. This is useful as when variables are grouped together, Displayr will both allow you to manipulate them all at the same time and will automatically summarize them all at the same time when creating tables.
In the example below, the triangle to the left of Race tells us that it contains more than one variable (the triangle appears when you hover over the data set).
When you click on the triangle, the variable set is expanded, and we can see all the variables within it.
More info on how to combine variables into variable sets, their structure (how they appear in tables), and value attributes (the categories and values used in calculations) can be found on our Variable Sets page.
Next
If the data set looks OK, the next step is to learn the basics of Displayr.
Otherwise, it is necessary to either:
- Obtain a better data file and import this.
- Clean the data.