How to Split Text Strings in Displayr Using R – Displayr Help

String splitting is the process of breaking up a text string in a systematic way so that the individual parts of the text can be processed. When wanting to split a delimited string to get a list of items (such as brands), the quickest method is to use the List of Items automation. This is typically used when coding comma-delimited spontaneous awareness data.

In other scenarios where you want to do something more custom, you may want to use R to automatically break apart the bits of the string into new variables. This article describes how to go from a single variable where commas separate the 1st, 2nd, and 3rd mentions, and so on ...

To store each mention in a separate variable:

A view of the final result in the Data Sources tree includes the original text variable with comma-delimited responses and a variable for each mention, like so:

Screenshot 2024-04-23 104121.png

Requirements

A data set loaded into Displayr that contains a text variable where multiple responses are stored in a single variable as comma-separated text values.

Method - Using strsplit()

You can perform the series of steps outlined below using the strsplit() function. While there is a way to do this with less code, you may find this way useful if you need to do some further custom manipulations before saving into the new variables.

OPTIONAL: If your goal is not to add variables to the data set, you can use Calculation > Custom Code from the toolbar. This is also a good option to use when prototyping your code before you create new variables.

To create new mention variables, follow these steps:

In the Data Sources tree, hover anywhere and select + > Custom Code > R > Text. A new variable will be created in the Data Sources tree.

Next, copy and paste the code below into the R Code editor. Replace "Q1 Spontaneous Awareness" in the code with the Label of your text variable.

###Split each response by the commas
x <- strsplit(`Q1 Spontaneous Awareness`, ",")

###Get max number of mentions to automatically create headers
#find max number of mentions 
n = max(sapply(x, length))
#set the length of each response to the max (this will fill in NAs)
for (j in 1:length(x))
length(x[[j]]) <- n
#combine the list of responses into a matrix
z = do.call(rbind, x)
#replace NAs with blanks
z[is.na(z)] <- "" # Replace NAs with blanks
#name the columns with the number of mentions
colnames(z) = paste0("Mention: ",1:ncol(z))

###If saving as a variable edit the column results that are saved below
z[,1] # Show only first column of results

Click Calculate from the object inspector .
With this new variable selected, go to the object inspector , and under General > General, give the variable a Label and, if you'd like, edit the Name.

This code does the following things:

Uses the strsplit() function to split the text. This function splits the elements of a text (character) vector x into substrings according to the matches to substring split within them, in this example, a comma.
Resets the length of each vector so they are all equal. This is done so that the data can be coerced into a matrix.
Uses call() as a convenient way to rbind() (combine as rows) all of the split elements.
Ensures any NA values introduced are converted to blank strings.
Extracts the first column of the tabulated data.

Variables for 2^nd brand mentioned, 3^rd mention, and so on, could be added by repeating the process above and modifying the last line of code to refer to columns 2, 3, etc of the table of split elements.

Method - Using tstrsplit()

The following is the most efficient way of creating the new mention variables. The tstrsplit() function will automatically coerce the parsing to a table and fill it with NAs.

OPTIONAL: If your goal is not to add variables to the data set, you can instead use Calculation > Custom Code and replace the last line of code with data.frame(splits).

To create new mention variables, follow these steps:

Hover anywhere in the Data Sources tree and select + > Custom Code > R > Text. A new variable will be created in the Data Sources tree.

Next, copy and paste the code below into the R Code editor:

#your text variable with brands comma delimited
x = awareness
#split each row and make a list of each mention
splits=data.table::tstrsplit(x,",", fill=NA, type.convert=FALSE, names=F)
#pull off a particular Mention for each variable, below pulls the first mention
splits[1]

Click Calculate from the object inspector .
With this new variable selected, go to the object inspector , and under General > General give the variable a Label and, if you'd like, edit the Name.

Finding the Best Text Analysis for Your Data

How to Use R in Displayr

Articles in this section

Requirements

Method - Using strsplit()

Method - Using tstrsplit()

Next

Related articles