String splitting is the process of breaking up a text string in a systematic way so that the individual parts of the text can be processed. When wanting to split a delimited string to get a list of items (such as brands), the quickest method is to use the List of Items automation. This is typically used when coding comma-delimited spontaneous awareness data.
In other scenarios where you want to do something more custom, you may want to use R to automatically break apart the bits of the string into new variables. This article describes how to go from a single variable where commas separate the 1st, 2nd, and 3rd mentions, and so on ...
To storing each mention in a separate variable:
A view of the final result in the Data Sets tree includes, the original text variable with comma delimited responses and a variable for each mention like so:
Requirements
- A data set loaded into Displayr that contains a text variable where multiple responses are stored in a single variable as comma-separated text values.
Method - Using strsplit()
You can perform a series of steps to do this as outlined below using the strsplit() function. While there is a way to do this with less code, you may find this way useful if you need to do some further custom manipulations before saving into the new variables.
OPTIONAL: If your goal is not to add variables into the data set, you can instead use the Anything icon > Calculation > Custom Code (previously known as R Output). This is also a good option to use when prototyping your code before you create new variables.
To create new mention variables, follow these steps:
- Select the Anything icon > Data > Variables > New > Custom Code > R - Text. A new variable will be created in the Data Sources tree.
- With this new variable selected, go to the object inspector on the right of your screen and under General > GENERAL give the variable a Label and you'd like, edit the Name. The Structure by default will be set to Text.
- Next, copy and paste the code below into the R CODE box:
###Split each response by the commas
x <- strsplit(`Q1 Spontaneous Awareness`, ",")
###Get max number of mentions to automatically create headers
#find max number of mentions
n = max(sapply(x, length))
#set the length of each response to the max (this will fill in NAs)
for (j in 1:length(x))
length(x[[j]]) <- n
#combine the list of responses into a matrix
z = do.call(rbind, x)
#replace NAs with blanks
z[is.na(z)] <- "" # Replace NAs with blanks
#name the columns with the number of mentions
colnames(z) = paste0("Mention: ",1:ncol(z))
###If saving as a variable edit the column results that are saved below
z[,1] # Show only first column of results - Click Calculate.
This code does the following things:
- Uses the strsplit() function to split the text. This function splits the elements of a text (character) vector
x
into substrings according to the matches to substringsplit
within them, in this example, a comma. - Resets the length of each vector so they are all equal. This is done so that the data can be coerced into a matrix.
- Uses call() as a convenient way to rbind() (combine as rows) all of the split elements.
- Ensures any NA values introduced are converted to blank strings.
- Extracts the first column of the tabulated data.
Variables for 2nd brand mentioned, 3rd mention, and so on, could be added by repeating the process above and modifying the last line of code to refer to columns 2, 3, etc of the table of split elements.
Method - Using tstrsplit()
The following is the most efficient way of creating the new mention variables. The tstrsplit() function will automatically coerce the parsing to a table and fill with NAs.
OPTIONAL: If your goal is not to add variables into the data set, you can instead use the Anything icon > Calculation > Custom Code (previously known as R Output) and replace the last line of code with data.frame(splits)
.
To create new mention variables, follow these steps:
- Select the Anything icon > Data > Variables > New > Custom Code > R - Text. A new variable will be created in the Data Sources tree.
- With this new variable selected, go to the object inspector on the right of your screen and under General > GENERAL give the variable a Label and you'd like, edit the Name. The Structure by default will be set to Text.
- Next, copy and paste the code below into the R CODE box:
#your text variable with brands comma delimited
x = awareness
#split each row and make a list of each mention
splits=data.table::tstrsplit(x,",", fill=NA, type.convert=FALSE, names=F)
#pull off a particular Mention for each variable, below pulls the first mention
splits[1] - Click Calculate.
Variables for 2nd brand mentioned, 3rd mention, and so on, could be added by repeating the process above and modifying the last line of code to refer to columns 2, 3, etc of the table of split elements.
See Also
Finding the Best Text Analysis for your Data
How to Work with R in Displayr