It’s tempting to start writing or scripting a questionnaire by thinking of all of the questions you want to ask and then just adding them without considering what the data would look like once it has been collected. Still, it’s generally more helpful to think of what your deliverables are and what your data should look like to easily help you achieve them, and then work backward to your questionnaire. So, before creating your script, it is important to Create an Analysis Plan.
The ways we sometimes program questions in a questionnaire can make it difficult for Displayr to recognize variables and structure them correctly. In this article, we will not only show you how to script/program your questionnaire but also how the data should be stored once the data is exported from your data collection platform to help Displayr better structure your data. We will cover some common pitfalls that usually make it difficult for Displayr to correctly structure your questions.
Before continuing with this article you should have a sound understanding of the different Variable Set Structures & Question Types available in Displayr.
Let's go...
Single Response Questions (Nominal: Mutually exclusive categories)
Multi-Response Questions (Binary-Multi)
Multi-Response Questions (Binary - Grid)
Single Response Questions (Nominal-Multi)
Single Response Questions (Nominal: Mutually exclusive categories)
What does a single-response question typically look like in a questionnaire?
A single-response question is a question that allows the respondent to select only one answer in the questionnaire (it is typically denoted with a single radio button () in the data collection platform). See the example below where we asked respondents how old they are in Question d1 of the questionnaire. Each respondent had to select only one of the 9 Age categories:
How should the data be stored?
The data should be stored as a single column in the data file. If we look at the data for the Age question, we will note that it is stored as a single column of data that contains all the respondents' answers. In the example below respondent number 1 selected 25-29 (the Label) which relates to category 2 (the Value) in the question above.
What Variable Set Structure will Displayr assign to this question?
If the data is stored correctly, as per the example above, Displayr will assign the Structure as a Nominal: Mutually exclusive categories.
Multi-Response Questions (Binary-Multi)
What does a multi-response question typically look like in a questionnaire?
A multi-response question is a question that presents respondents with a list of options and permits them to choose multiple answer options from the list (it is typically denoted with a square check box () in the data collection platform). See the example below where we asked respondents which cola brands they had ever heard of in Question 1b of the questionnaire. Each respondent was permitted to select multiple cola brands:
How should the data be stored?
The data should be stored as multiple adjacent columns (also known as Variables), one for each possible answer option in the questionnaire. In the example of q1b, we would therefore have 5 adjacent columns (one for each brand the respondent could select).
Here are some key considerations when setting up a multi-response question in your data collection platform. Also, have a look at Checking Multi Response Questions:
- In most situations, multi-response questions should be set up in a binary format. A binary question contains exactly two unique Values and should ideally be denoted with the values 0 and 1.
- When setting up the values and labels for these binary variables, it is important that the same Label is used for all options. In particular, the value label should not contain the name of the option being evaluated. In our q1b example the label should not be Coca-Cola or Diet Coke etc. but rather be 1 = Aware or 0 = Not Aware, 1 = Selected or 0 = Not Selected.
q1b. Which of the following cola brands have you ever heard of?
Values Labels
SYSMIS (MISSING DATA) Option not shown
0 Not Aware
1 Aware
- The data for the same question should be stored in columns adjacent to each other.
- Ensure Variable Names are unique for each Question i.e. q1b and ensure there are no duplicate names across different questions
- The variables that are part of the same question should follow a predictable, simple pattern: q1b_1, q1b_2, q1b_3, etc.
What Variable Set Structure will Displayr assign to this question?
If the data is stored correctly, as per the example above, Displayr will assign the Structure as a Binary - Multi: Non-mutually exclusive categories.
Multi-Response Questions (Binary - Grid)
What does a multi-response grid question typically look like in a questionnaire?
A multi-response grid question is a question that presents respondents with a list of options, typically in a grid, and permits them to choose multiple answer options for each row in the grid (it is typically denoted with a square check box () in the data collection platform). In the question below, respondents were asked to select which brands they felt best fit with each statement. Each respondent was permitted to select multiple cola brands per statement.
These questions could typically take two shapes in a questionnaire and can either be asked as a grid, as per the example below:
Or it can be asked as a looped question (i.e. the same question gets asked but loops through the different statements, each on its own screen.
How should the data be stored?
The data should be stored as multiple adjacent columns (also known as Variables), one for each combination of answers in the questionnaire. In the example of q5, we would therefore have 18 adjacent columns (one for each statement - brand combination the respondent could select).
Here are some other key considerations to keep in mind before exporting your data from your data collection platform. This will help Displayr correctly store your question as a Binary - Grid.
- Each combination of answers should be stored as its own column of data. In the example above the first column is q5a1 which represents feminine - Coke, the next column q5a2 is feminine - Diet Coke, etc.
- Ensure you set up response variables with labels that read “Question Text - Choice Text” i.e. feminine - Coke, feminine - Diet Coke, etc. If possible, arrange these labels from General i.e. Question Text to Specific i.e. Choice Text.
- The data for the same question should be stored in columns adjacent to each other
- Ensure Variable Names are unique for each Question i.e. Q5 and ensure there are no duplicate names across different questions
- The variables that are part of the same question should follow a predictable, simple pattern: Q5a1, Q5a2, Q5b1, Q5b2. Here, the a and b denote the General Question Text i.e. a will be all the feminine answers and b will be the health-conscious answers. The numbers represent the specific Choice Text, where 1 will represent Coca-Cola and 2 represent Diet Coke etc.
- Some survey platforms cut off labels over a certain length. Brief, unique variable labels make it more likely the full label will come through when saving the file
- As per the Binary - Multi questions, the data should be stored in Binary format with 1 (value) representing yes/selected (label) and 0 representing no/not selected. It is important to ensure that the labels are consistent for the whole question. So, maintain the labels as yes/no for the whole question.
What Variable Set Structure will Displayr assign to this question?
If the data is stored correctly, as per the example above, Displayr will assign the Structure as a Binary - Grid:
Here is an article that gives some more insight on How to Set Value Attributes for a Binary-Multi and Binary Grid in Displayr.
Single Response Questions (Nominal-Multi)
What does a Nominal-Multi question typically look like in a questionnaire?
A nominal-multi question is a question that presents respondents with a list of options, typically in a grid, and permits them to choose only a single answer option for each row in the grid (it is typically denoted with a radio button () in the data collection platform). In the example below we asked respondents to tell us how they feel about different Cola brands, each respondent had to provide one answer on the scale (Hate to Love) for each of the brands listed.
These questions could typically take two shapes in a questionnaire and can either be asked as a grid, as per the example below:
Or it can be asked as a looped question (i.e. the same question gets asked but loops through the different statements, each on its own screen.
How should the data be stored?
Nominal-multi variables are a series of Nominal variables with the data being stored in multiple adjacent columns. Each column represents the category respondents were asked about i.e. in our example above Q4a will be Coca-Cola and Q4b will be Diet Coke etc. Each column shares exactly the same Values and Labels.
Here are some other key considerations to keep in mind before exporting your data from your data collection platform. This will help Displayr correctly store your question as a Nominal - Multi.
- Each category should be stored in its own column
- Each column should have exactly the same Values and Labels and they should follow the same order. See the example below:
q4 How do you feel about the following Cola Brands
Coca-Cola
Values Labels
1 Hate
2 Dislike
3 Neither like or dislike
4 Like
5 Love
Diet Coke
Values Labels
1 Hate
2 Dislike
3 Neither like or dislike
4 Like
5 Love
- The data for the same question should be stored in columns adjacent to each other
- Ensure you set up response variables with labels that read “Question Text - Choice Text” i.e. Brand Attitude - Coke, Brand Attitude - Diet Coke, etc. If possible, arrange these labels from General i.e. Question Text to Specific i.e. Choice Text.
- Ensure Variable Names are unique for each Question i.e. Q4 and ensure there are no duplicate names across different questions
- The variables that are part of the same question should follow a predictable, simple pattern: Q4a, Q4b, etc.
- Some survey platforms cut off labels over a certain length. Brief, unique variable labels make it more likely the full label will come through when saving the file.
What Variable Set Structure will Displayr assign to this question?
If the data is stored correctly, as per the example above, Displayr will assign the Structure as a Nominal - Multi:
There you go, if you follow these simple rules, you should be on your way to having a well-structured data set when you import it into Displayr.
Finally, here are some additional tips to keep in mind when scripting your survey
Missing Missing Values
- Mark missing values (skipped/didn’t see the question) with a Blank cell
- Do not mark it with a 0 - reserve 0s to indicate “Saw but didn’t select the choice”. Otherwise, when you pull it into Displayr those values will be counted as part of the base, which may or may not be correct, especially if you want the base to be only brand users.
Other Specify answer options
- Include 'Other specify' as part of the answer options, and when a respondent selects it, open up a text box where they can type their answers.
- Including it as a category will help accurately set the base of the question and aid in back coding.
Scale Questions
- Ensure your scale is always coded the 'right way' around (even if it is not the order the respondent sees the scale) i.e. Love = 5, Hate = 1. Where the highest value represents the highest score to the lowest value represents the lowest score.
- This will ensure when you run transformation scripts in Displayr such as T2B it will always compute it correctly - How to Create Top Two Category Variable(s) (Top 2 Boxes)
- This also helps if doing advanced analyses and avoids the need for recoding.
- If you can, already script the midpoints of Ages into the values for each category, this will ensure if you compute average ages that it will compute it correctly. The same goes for any questions where you want to calculate averages during analysis. In other words, if a respondent provides their age as an age range of 18-24, if possible, the Value for this answer option should get a code of 21 instead of 1. This will make certain calculations easier during analysis.
Create Hidden/Dummy Variables
- If you can, try and create hidden variables to create reusable/copyable variables that can speed up analysis later and eliminate the need to manually create NETs.
- Say you have asked people where they live in the UK and you provided granular regions but later on you only want to ask a set of questions if people live in Wales in the survey itself.
- Keep a hidden variable that marks a respondent as Welsh, if they said they live in Cardiff.
- These hidden variables are great to have as you do not have to spend time grouping them in Displayr.
- The same goes for any special groupings that will be important in your analysis. If it’s easy to set up a calculation as a hidden variable that you’re going to use across many surveys, that may save time creating variables during analysis in Displayr, as it will be pre-loaded with your data set.