Displayr makes it easy to manually create new variables in your data via the variable inserter menu. See How to Add Variables to a Document for the various ways. One of the ways is to write custom R code to create the variable you need. You will want to do this when:
- You want to create categories based on values in multiple variables (also known as recoding a question or categorizing/bucketing respondents into new groups).
- You want to create a dynamic filter or change the variable's data based on controls (i.e., a combo box) or other outputs in your Report.
- You want to create a filter where the conditions are dependent on calculated values (i.e., the average of a variable) or the value of a separate variable (i.e., if Q1 > Q2 for each case).
This article contains the steps for the first use-case above, where you can use the responses across multiple variables like below...
to create a new categorical (Nominal) variable with values like:
And shown in table form like this:
In R code, there are usually different methods for getting the same outcome. This article covers two different ways of doing this:
The Additional Notes section also has information on how to order the categories in your new variable, if you wish.
Requirements
- A dataset with at least two variables imported into Displayr. To follow along with the example below, use this .sav file.
- Knowledge of how to construct conditions for the criteria for each category in R. See: How to Work with Conditional R Formulas and How to Use Different Types of Data in R.
- The R code you write must return one of the following:
- TRUE/FALSE values -- to be converted into 1/0. If you want to specify category labels for each number, you will do so manually in the object inspector
> Labels.
- Numeric values -- to use each number as a category or as a category value. If you want to specify category labels for each number, you will do so manually in the object inspector
> Labels.
- A factor of the category labels using the factor() function. If you'd rather specify category values in your code, you can use the levels argument to specify the order of the categories. If you need specific values for each category, you can add them manually in object inspector
> Values.
- TRUE/FALSE values -- to be converted into 1/0. If you want to specify category labels for each number, you will do so manually in the object inspector
Method - Using Indexing
When using indexing to create a new variable, the category criteria appear inside brackets that select which data meets the criteria. How this works is explained in How to Use Different Types of Data in R, and more on the syntax needed is in How to Work with Data in R.
In the example below, we will use criteria for age (Age) and living arrangements (d4) to categorize respondents into groups.
- Hover over a variable in the Data Sources tree and click on + > Custom Code > R > Nominal.
- In the object inspector
, change the General > Label to something like New Groups.
-
In the R Code, paste in the following. Note that the comments (prefaced with a #) are there to explain what the code does, so you can modify it for your own needs. Also of note, when setting up your conditions, you should use
==to test against one value and use%in%to test against a series of values; more info is here.####OPTIONAL - create variables for the different criteria #flag respondents who fall into the following categories young = Age %in% c("18 to 24", "25 to 29", "30 to 34") single = d4 %in% c("Living alone", "Sharing accommodation") partner.only = d4 == "Living with partner only" children = d4 %in% c("Living with partner and children", "Living with children only") ####INDEX to assign new categories #create empty data series using the length of a different variable newcategories = rep(NA, length(Age)) #put the criteria in brackets to assign the new categories newcategories[young & single] = "Young singles" newcategories[!young & single] = "Older singles" newcategories[young & partner.only] = "Young couples" newcategories[!young & partner.only] = "Older couples" newcategories[young & children] = "Young families" newcategories[!young & children] = "Older families" #return final results as a factor to make categorical factor(newcategories) - Click Calculate. You will now have a new variable for the different groups, which you can use in your Report.
Do note, you can create optional criteria variables as defined above in the first section of the code, or include the criteria directly inside the brackets. For example, the following code uses the variables young and single to subset based on the logic used to create each of those:
newcategories[young & single] = "Young singles"newcategories[Age %in% c("18 to 24", "25 to 29", "30 to 34") &
d4 %in% c("Living alone", "Sharing accommodation")] = "Young singles"
Method - Using dplyr::case_when
The dplyr package contains a case_when() function that can also be used to assign categories to respondents based on criteria. It is a bit cleaner-looking and faster to work with since much of the repetitive syntax in indexing isn't required.
In the example below, we will use criteria for age (Age) and living arrangements (d4) to recategorize respondents into groups.
- Hover over a variable in the Data Sources tree and click on + > Custom Code > R > Nominal.
- In the object inspector
, change the General > Label to something like New Groups.
-
In the R Code editor, paste in the following code. Comments are prefaced with a #, and are there to explain what the code does so you can modify it for your own needs. When setting up your conditions, you should use
==to test against one value and use%in%to test against a series of values. More info is here. Also of note, withincase_whenwe use~rather than=to assign the category:####OPTIONAL - create variables for the different criteria #flag respondents who fall into the following categories young = Age %in% c("18 to 24", "25 to 29", "30 to 34") single = d4 %in% c("Living alone", "Sharing accommodation") partner.only = d4 == "Living with partner only" children = d4 %in% c("Living with partner and children", "Living with children only") ####use dplyr::case_when to assign new categories library(dplyr) newcategories=case_when( young & single ~ "Young singles", !young & single ~ "Older singles", young & partner.only ~ "Young couples", !young & partner.only ~ "Older couples", young & children ~ "Young families", !young & children ~ "Older families", !children & !partner.only & !single ~ "Other" ) #return final results factor(newcategories) - Click Calculate. You will now have a new variable for the different groups, which you can use in your Report.
Additional Notes
To specify an order for the categories you create, you can use the levels argument inside the factor() function, like below:
#return final results
factor(newcategories,levels=c("Young singles","Young couples","Young families",
"Older singles","Older couples","Older families","Other"))Next
How to Work with Conditional R Formulas
How to Recode Data Based on a Lookup Using R
How to Work with Date Ranges Using R
How to Filter Raw Data Using R
How to Use R Code to Create a Filter Based on Single-Response Questions