String splitting is the process of breaking up a text string in a systematic way so that the individual parts of the text can be processed. This article describes how to go from a single variable where commas separate the 1st, 2nd, and 3rd mentions, and so on ...
To storing each mention in a separate variable:
- A data set loaded into Displayr that contains a text variable where multiple responses are stored in a single variable as comma-separated text values.
In Displayr, to split the text variable requires adding new variables that will appear alongside the original text variable in the Data Set tree, similar to below:
To do so, follow these steps:
- Select Anything > Data > Variables > New > Custom Code > R - Text. A new variable will be created in the Data Sets tree.
- With this new variable selected, go to the object inspector on the right of your screen and under Properties > GENERAL give the variable a Label, a Name and the Structure by default will be set to Text.
- Next, copy and paste the code below into the R CODE box:
x <- strsplit(awareness, ",")
# Get max length
n = max(sapply(x, length))
for (j in 1:length(x))
length(x [[j]]) <- n
z = do.call(rbind, x)
z[is.na(z)] <- "" # Replace NAs with blanks
colnames(z) = paste0("Mention: ",1:ncol(z))
z[,1] # Show only first column of results
- Click Calculate.
This code does the following things:
- Uses the strsplit() function to split the text. This function splits the elements of a text (character) vector
xinto substrings according to the matches to substring
splitwithin them, in this example, a comma.
- Resets the length of each vector so they are all equal. This is done so that the data can be coerced into a matrix.
- Uses call() as a convenient way to rbind() (combine as rows) all of the split elements.
- Ensures any NA values introduced are converted to blank strings.
- Extracts the first column of the tabulated data.
Variables for 2nd brand mentioned, 3rd mention, and so on, could be added by repeating the process above and modifying the last line of code to refer to columns 2, 3, etc of the table of split elements.
OPTIONAL: If your goal is not to add variables into the data set, you can instead use Anything > Calculation > Custom Code (previously known as R Output). This is also a good option to use when prototyping your code before you create new variables.