Sometimes it's helpful to look not just at the most frequent words/phrases in text but also at the sentiment of those words to see how positive/negative as a whole responses are. This article describes how to go from a standard Word Cloud:
to one that is color-coded based on the positive (green) and negative (red) sentiment of the word:
Requirements
A data file that contains a variable with the phrases you wish to use to create the Word Cloud. Text variables are represented by a small a next to the variable in the Data Sets tree:
Method
- First we will tidy the data to make it faster for the sentiment analysis to run. From the toolbar, go to Anything > Advanced Analysis > Text Analysis > Advanced > Setup Text Analysis.
- Select your text variable from the Text variable drop-down.
- [Optional]: Update additional settings per step 3 in How to Set Up Text for Analysis.
- From the object inspector, click SAVE VARIABLE(S) > Sentiment to calculate sentiment scores based on the text analysis and save scores as a new variable in the data set.
- Go to Calculation > Custom Code and draw a box on the page.
- Paste the code below in the R CODE box and edit as needed:
#### Flag each response as positive or negative
#identify the variable with sentiment scores to use put the label in backticks below
phrase.sentiment = `Sentiment scores from text.analysis.setup`
#turn all positive sentiments (1 or more) and negative sentiments (-1 or less)
phrase.sentiment[phrase.sentiment >= 1] = 1
phrase.sentiment[phrase.sentiment <= -1] = -1
#### Get data about the top words found in each response
#identify the specific text analysis setup that you want to use
text.analysis.setup = text.analysis.setup
#pull off the top words from the text
final.tokens = text.analysis.setup$final.tokens
#pull off the counts of the top words
counts = text.analysis.setup$final.counts
#create a binary table to flag if each top word is present in each response
td = t(vapply(flipTextAnalysis:::decodeNumericText(text.analysis.setup$transformed.tokenized),
function(x) { as.integer(final.tokens %in% x) },
integer(length(final.tokens))))
#### Translate response sentiment to word-level
#use the sentiment score from the response to give each top word in the response
#the sentiment score of the overall response
phrase.word.sentiment = sweep(td, 1, phrase.sentiment, "*")
#if top word is not in the response, make the sentiment for that word missing
phrase.word.sentiment[td == 0] = NA
#### See if word is statistically significant to positive or negative sentiment
#for each top word, calculate statistics
word.mean = apply(phrase.word.sentiment,2, FUN = mean, na.rm = TRUE) #average
word.sd = apply(phrase.word.sentiment,2, FUN = sd, na.rm = TRUE) #standard deviation
word.n = apply(!is.na(phrase.word.sentiment),2, FUN = sum, na.rm = TRUE) #sum
word.se = word.sd / sqrt(word.n) #calculate standard error
word.z = word.mean / word.se #calculate z-score
word.z[word.n <= 3 | is.na(word.se)] = 0 #if word has under 3 mentions z-score is 0
#### create a final table of all the top words along with their sentiment and z-score
words = text.analysis.setup$final.tokens
x = data.frame(word = words,
freq = counts,
"Sentiment" = word.mean,
"Z-Score" = word.z,
Length = nchar(words))
#sort the table based on the mentions of the words descending
word.data = x[order(counts, decreasing = TRUE), ]
#### Calculate the color
#get number of words to show in cloud
n = nrow(word.data)
#create initial list of the color of each word as grey
colors = rep("grey", n)
#change the color of the word if it's statistically significant based on z-score
colors[word.data$Z.Score < -1.96] = "Red"
colors[word.data$Z.Score > 1.96] = "Green"
#### Create the word cloud
#load the R package with the wordcloud function
library(wordcloud2)
#create the wordcloud
wordcloud2(data = word.data[, -3], color = colors, size = 0.4)