For various reasons, as outlined in Find the Best Possible Data File, ideally you would not be importing your data using an Excel file. However, in reality, sometimes this is unavoidable. This article describes how to format your Excel file (.xlsx or .csv) to aid Displayr in correctly importing and setting up your data. If not already done so, please review
Requirements
- A Displayr document.
- An Excel data set.
Method - Before importation
1. All of your Excel data should be in one worksheet. Displayr won’t be able to use data in multiple worksheets. Only the first worksheet in the file will be imported.
2. Each variable from your survey should have its own column (e.g., Age is in one column, Income is in another).
3. The first row of the Excel file should have the names for each variable. Ensure:
- Headings begin with letters or words, not numbers or symbols (#!?&)
- Headings describe what is in each column (e.g., Age or D1 - Age rather than just D1). This will make things a lot easier when the data is in Displayr.
- Not to name a heading one of R's common functions. Using one may conflict with some of our automations and cause unintended results. These names include: list, length, exists, assign, names, rep, any, stop, min, max, matrix, attr, sum.
4. Ensure there is only a single header row.
5. For multiple-response questions or grid-style questions, use a common prefix in the header row label to tell Displayr to group these as the same variable set.
6. Each person’s responses should be contained in a separate row with a unique identifier column included.
7. Where possible, ensure that categories are stored as labels rather than numbers (e.g. use Male and Female instead of 1 and 2 in the column for the gender question).
8. Data should be stored as values and not created using formulas or references to other cells.
9. You should not have any merged cells in the table.
10. If loading data that was created internally or modified after export, ensure that blank rows and columns are truly blank. In Excel, this is akin to using Clear Contents vs right-click and Delete - you need to be sure to right-click and Delete any cells that may have had data or formatting on them previously.
11. When you have a multiple-response question, the data should be stored as one column for each of the possible options that were shown in the survey. Be sure to use the same two labels for selected and not selected for each of the options (Aware and Not Aware below). Avoid storing multiple responses within a single column, as this will make the data very difficult to analyze in Displayr.
12. When a respondent has not given an answer to a question or skipped the question, then they should have an entry of NA in that cell. When working with categories, other kinds of responses that indicate the respondent saw the question but did not give an answer are generally best given a label of Don’t know or whatever phrasing matches your questionnaire.
Method - After importation
1. When Displayr sees a column full of text in the Excel file, it needs to make a decision about whether the text in the column represents a set of category labels or it represents open-ended responses. Displayr makes an educated guess based on the number of unique responses in the column, and the lengths of the responses. Sometimes Displayr can guess incorrectly, particularly if, for example, you have a really large brand list.
- In these cases, you can select your variable(s) in the Data Sources tree and go to Data > Structure in the object inspector to change the data format. For example, changing a variable from Nominal to Numeric will display an average instead of percentages, and changing Text to Nominal will display category percentages instead of raw text.
2. When you import categorical data as labels from Excel, it does not include any metadata to tell Displayr what values to assign each category. Values are therefore automatically assigned based on alphabetical order.
- If you wish to amend category values or labels, you can select your variable(s) in the Data Sets tree and go to Data > DATA VALUES > Values in the object inspector.
- You can additionally edit variable labels directly in the Data Sources tree by right-clicking a variable and selecting Rename.
3. If variables are meant to be grouped, as in a multiple-response question or grid, you can group them as follows:
- Hold down your Ctrl key and from the Data Sources tree, select each of the variables that you want to group.
- Right-click > Combine and the variables will be grouped into a single variable set.
- Expand the combined variable set in the Data Sources tree to view the individual variables.
See Also
How to Import Data from Your Computer
How to Enter Data from Paper Surveys