This article describes how to use the values in multiple variables to create a new variable of categories.
The article contains the steps to go from data like this...
...creating a new variable with categories that appear in raw data form like: to new data categories like this:
And in table form like this:
Requirements
Please note these steps require a Displayr license.
- A dataset with at least two variables imported into Displayr. To follow along with the example below, use this .sav file.
- Knowledge of how to construct conditions for the criteria for each category in R. See: How to Work with Conditional R Formulas and How to Use Different Types of Data in R.
Method - Using Indexing
When using indexing to create a new variable, the category criteria appear inside brackets that select which data meets the criteria. How this works is explained in How to Use Different Types of Data in R and more on the syntax needed is in How to Work with Data in R.
In the example below we will use criteria for age (Age) and living arrangements (d4) to categorize respondents into groups.
- Hover over a variable in the Data Sources tree and click on + > Custom Code > R - Text.
- In the object inspector change the General > Label to something like New Groups.
- Change the Structure to Nominal (since we are creating a categorical variable).
- In the R Code paste in the following, note the comments (prefaced with a #) are there to explain what the code does so you can modify it for your own needs. Also of note, when setting up your conditions, you should use
==
to test against one value and use%in%
to test against a series of values, more info is here.
####OPTIONAL - create variables for the different criteria
#flag respondents who fall into the following categories
young = Age %in% c("18 to 24", "25 to 29", "30 to 34")
single = d4 %in% c("Living alone", "Sharing accommodation")
partner.only = d4 == "Living with partner only"
children = d4 %in% c("Living with partner and children", "Living with children only")
####INDEX to assign new categories
#create empty data series using the length of a different variable
newcategories = rep(NA, length(Age))
#put the criteria in brackets to assign the new categories
newcategories[young & single] = "Young singles"
newcategories[!young & single] = "Older singles"
newcategories[young & partner.only] = "Young couples"
newcategories[!young & partner.only] = "Older couples"
newcategories[young & children] = "Young families"
newcategories[!young & children] = "Older families"
#return final results
newcategories - Click Calculate. You will now have a new variable for the different groups which you can use in your Report.
Do note, you can create optional criteria variables as defined above in the first section of the code or include the criteria directly inside the brackets. For example the following code:
newcategories[young & single] = "Young singles"
newcategories[Age %in% c("18 to 24", "25 to 29", "30 to 34") &
d4 %in% c("Living alone", "Sharing accommodation")] = "Young singles"
Method - Using dplyr::case_when
The dplyr package contains a case_when() function that can also be used to assign categories to respondents based on criteria. It is a bit cleaner-looking and faster to work with since much of the repetitive syntax in indexing isn't required.
In the example below we will use criteria for age (Age) and living arrangements (d4) to recategorize respondents into groups.
- Hover over a variable in the Data Sources tree and click on + > Custom Code > R - Text.
- In the object inspector change the General > Label to something like New Groups.
- Change the Structure to Nominal (since we are creating a categorical variable).
- In the R Code paste in the following code. Comments are prefaced with a #, and are there to explain what the code does so you can modify it for your own needs. When setting up your conditions, you should use
==
to test against one value and use%in%
to test against a series of values, more info is here. Also of note, withincase_when
we use~
rather than=
to assign the category:
####OPTIONAL - create variables for the different criteria
#flag respondents who fall into the following categories
young = Age %in% c("18 to 24", "25 to 29", "30 to 34")
single = d4 %in% c("Living alone", "Sharing accommodation")
partner.only = d4 == "Living with partner only"
children = d4 %in% c("Living with partner and children", "Living with children only")
####use dplyr::case_when to assign new categories
library(dplyr)
newcategories=case_when(
young & single ~ "Young singles",
!young & single ~ "Older singles",
young & partner.only ~ "Young couples",
!young & partner.only ~ "Older couples",
young & children ~ "Young families",
!young & children ~ "Older families",
!children & !partner.only & !single ~ "Other"
)
#return final results
newcategories - Click Calculate. You will now have a new variable for the different groups which you can use in your Report.
Next
How to Work with Conditional R Formulas
How to Recode Data Based on a Lookup Using R
How to Work with Date Ranges Using R
How to Filter Raw Data Using R
How to Use R Code to Create a Filter Based on Single-Response Questions