When a file of raw data is imported to Displayr, Displayr will use the metadata available to organize and configure the variable sets, see Find the Best Possible Data File. For example, survey data is usually stored in SPSS files (.sav) and the metadata tells Displayr how your questions were asked in the survey and how to structure them in.
If your file doesn't have metadata (such as Excel or csv files) then Displayr tries to figure out how to configure the variable sets based on the data in the file – all Displayr gets to see are the contents of each column. This article describes guidelines to use to format Excel data so that Displayr can more easily structure it correctly on import. If you are looking to instead import tables of data from Excel that are already calculated, then see How to Import Table Data into Displayr.
Requirements
Formatting Excel files for Displayr
The basic layout of your Excel file is as follows, and an example of what it should look like is included in the picture below:
- All of your data should be in one worksheet. Displayr won’t be able to use data in multiple worksheets.
- Each variable from your survey should have its own column (e.g., Age is in one column, Education is in another).
- The first row should contain headings. Make sure they go by the following rules:
- Begin with letters or words, not numbers
- Describe what is in each column (e.g., 'Age' or 'Q1. Age' rather than just 'Q1'). This will make things a lot easier when the data is in Displayr. See the image below for how multiple response questions should be set up.
- Do not name a heading one of R's common functions. Using one may conflict with some of our automations and cause unintended results. These names include: list, length, exists, assign, names, rep, any, stop, min, max, matrix, attr, sum.
- Each person’s responses should be contained in a separate row.
- Where possible, ensure that categories are stored as labels rather than numbers (e.g. use Male and Female instead of 1 and 2 in the column for the gender question).
- You should not have any merged cells in the table.
Setup for different data types
While Excel files do not have metadata to tell Displayr how to set up the data, Displayr will try to infer the structure of your data based on what is in the columns and column headers. There are some conventions that you can use that will help reduce the amount of setup that you need to do once the file is imported.
Categorical data
When a column contains categories, it is almost always better that the column contains the labels of the categories rather than numbers which indicate the category values. The reason is that if the labels are not present, you will need to enter them later. If you’re stuck with numbers in your file, you can use Excel’s find-and-replace feature to replace values within individual columns or even groups of columns.
Multiple-response data
When you have a multiple response question, the data should be stored in the Excel file with one column for each of the possible options that were shown in the survey. Be sure to avoid storing multiple responses within a single column, as this will make the data very difficult to analyze in Displayr.
An example of the proper layout for a multiple response question is shown above in the last three columns. The columns headed as Q1a, Q1b, and Q1c represent three options that were shown in a multiple-response question. More tips for column headers for this type of data are considered below.
Missing data and ‘Don’t know’ categories
When a respondent has not given an answer to a question or skipped the question, then they should have an entry of NA in that cell. When working with categories, other kinds of responses which indicate the respondent saw the question but did not give an answer are generally best given a label of 'Don’t know' or whatever phrasing matches your questionnaire.
Good conventions for column headers
Above mentioned previously, your headers should always start with a letter rather than a number and that they should be as descriptive as possible. Here we’ll take a deeper look at some common cases where improving the headers will make a big difference.
Ditch multiple header rows
Some online survey providers will export an Excel or CSV file with two or more header rows describing the data. Here is one such example that actually contains three header rows:
The respondent data does not begin until the fourth row in the file. The first row contains short names, the second row contains the full question text, and the third row contains some additional tags.
Before using a file like this in Displayr you must reduce it to one header. In some cases, you may simply delete the rows that you don’t need. In other cases, you might like to use the Excel CONCATENATE() function to create a new header row. CONCATENATE() is used to paste together two or more pieces of text, and so could be used to combine the information in the second row with the short names in the first row. By using a formula like =CONCATENATE(R1, ” – “,R2) and a little clever copying-and-pasting, the headers can quickly take a more appropriate shape:
Depending on the nature of your file, some further tidying may be necessary, but a little trick like this can get you a long way.
Use headings to ensure your variables get grouped together
There is a range of survey question styles which collect multiple answers from each person. The most common examples include the traditional multiple response question, and single response and multiple response grids. In most cases in your workflow, you will want to have the data from such questions grouped together in Displayr. The column headers that you use can help Displayr identify when a group of variables should be analyzed together. The basic idea is that the column headers should contain an identifiable structure which indicates when variables are meant to go together.
The screenshot in the first section of this article contains one such set of variables for a multiple response question. The column headers are Q1a, Q1b, and Q1c. The common prefix (Q1) is enough to tell Displayr that these variables go together.
Consider the case shown below, where we have asked people what aspects they feel are important when deciding what food to get delivered for a night in. The labels only tell us which response option was shown and there is no info in the headers to indicate that the variables are related to one another. These variables would not get grouped together when the data is imported into Displayr.
A better layout for these variables is as follows. Here, the text from the question has been included in the headers, and this pattern will ensure that Displayr groups these variables together when the data is imported.
The common prefix in the headers, “How important are the following when ordering delivery food? – “, is enough to allow Displayr to recognize the structure of this data.
Method
Using Excel files in Displayr
The tips above will help you get the best file possible Excel file for Displayr. Even with a tidy file, you may need to complete some other setup once you have imported the file into Displayr. This includes entering or tidying labels, recoding numeric values, and grouping variables. All of these things are really easy to take care of in Displayr once you know how!
Importing and checking your data
To figure out what parts of your data need to be tidied, it is good to start by generating a SUMMARY table for all of your survey questions so that you can go through them one-by-one. To do this, from the toolbar menu, select the Table icon > Report > Summary Tables.
The Report section on the left side will now contain a folder called 'Summary Tables' which contains a table for each question in your files. We'll now go through some common issues and how to address them.
Table shows an Average instead of categories
In this example, the Age question in my Excel file contained numbers for the age groups instead of labels. The table in Displayr is showing an Average of those numbers instead of showing the percentage of people in each age group.
To show categories instead of an Average you should:
- Select the variable from the Data Sources tree.
- From the object inspector on the right, change the Structure drop-down to Nominal: Mutually exclusive categories.
This process, called changing the question Structure, is one of the most fundamental things to learn about controlling your data in Displayr. For more on this, check out our getting started guide.
Categories don’t have labels
If you have columns with numbers representing categories, like the table above, you will want to add category labels. Labels for categories can easily be added in Displayr.
To enter labels:
- Select the variable from the Data Sources tree.
- From the object inspector on the right, click on the Labels button.
- Enter the desired category labels in the Label column.
- Click OK.
Recoding numeric values
When you first import your Excel file, Displayr will assign numeric values for any categories that it identifies. This is, importantly, different from using SPSS files and other files with metadata, where categories are stored with both a label and a numeric value.
For some categorical data, like Gender, the numeric values are relatively unimportant. However, if your data contains a scale, like satisfaction ratings, then the numeric values can be useful to your analysis in computing an average for the scale among different groups of respondents.
To check and change the numeric values that have been assigned
- Select the variable from the Data Sources tree.
- Click the Values button in the object inspector.
- Change the entries in the Value column as needed.
- Click OK.
Tidying variable labels
Variable labels come from the column headers in your Excel file. These labels will affect the way items appear in tables, particularly for multiple response questions, and other types of questions that contain multiple variables (like grids). You can access variable label by select the variable from the Data Sources tree and changing the Label value in the object inspector. Alternatively, you can right-click on the variable and select Rename. and then enter a new label.
Data appears as Text
When Displayr sees a column full of text in the Excel file it needs to make a decision about whether the text in the column represents a set of category labels or it represents open-ended responses. Displayr makes an educated guess based on the number of unique responses in the column, and the lengths of the responses. Sometimes Displayr can guess incorrectly, particularly if you have a really large brand list.
If Displayr has not identified your data as being categorical, the table will show all of the responses. To create a new copy of the data with each unique response turned into a category you should:
- Select the variable from the Data Sources tree.
- From the toolbar menu, click Duplicate to make a copy of the variable.
- With the duplicated variable selected, change the Structure to Nominal: Mutually exclusive categories.
Grouping variables
If variables are meant to be grouped, as in a multiple response question or grid, you can group them as follows:
- In the Data Sources tree, highlight the rows that you want to group.
- Right click and select Combine > As Variable Set, then select either Do Not Merge Variables to create a multi-response question or select Grid to create a grid question.
- Enter an appropriate Label and set the question Structure in the object inspector.
Other issues
How to Fix Metadata Issues in Displayr outlines even more solutions than above for more specific data formatting issues.
Next
How to Read an Excel File into Displayr
How to Fix Metadata Issues in Displayr
How to Set Up Files With No Metadata
Tips to Get an Excel or CSV File Working Well in Displayr