Sentiment analysis is a way to quantify the feeling or tone of written text. In a survey context, this is a useful technique for gauging the overall attitude towards a brand (or whatever you like). In sentiment analysis, each case receives a numeric sentiment score (on a negative to positive scale).
This article describes how to use raw text data:
To calculate an overall sentiment score:
or to use sentiment scores in further analysis, such as crosstabs:
You will need a Text variable in order to perform text analysis and word cloud creation. Text variables are represented by a small a next to the variable in the Data Sets tree:
- Select your text variable from the Data Sets tree.
- Go to Anything > Advanced Analysis > Text Analysis > Sentiment.
- A new variable with a Numeric structure will appear in your Data Sets tree that represents the sentiment score.
You can use this new variable in a variety of ways:
- In cross-tabulations with other questions to see how the sentiment score may vary for different groups within the sample.
- Looking at correlations of sentiment scores with other numeric variables (eg, use Correlation Matrix).
- You could also turn the numeric sentiment score variable into a nominal variable to divide your sample into those who are positive (score of 1 or higher), neutral (score of 0), and negative (score of -1 or less) on the topic.
In some cases, you might like to “clean” your raw text variable before computing the sentiment scores. This is where the Text Analysis Setup feature can help. In Displayr it is found under Anything > Advanced Analysis > Text Analysis > Advanced > Setup Text Analysis. This creates an R output on the page where the raw text is processed for spell-checking, stemming, removal of words, replacement of specific words, and combination of words into phrases. To calculate the sentiment scores from the Text Analysis Setup, simply select the Text Analysis Setup output on the page, and then Anything > Advanced Analysis > Text Analysis > Sentiment from the toolbar.
Displayr compares the contents of each text response to English-language dictionaries of positive words and negative words. Positive words get a +1 scoring, negative words get a -1 scoring. The final sentiment score for each response is the sum of these scores. The process also tries to identify when a sentiment has been negated. For example, “not good” would generate a score of -1 instead of a score for 1.
To illustrate, consider these cases from a hypothetical text variable. The first cases receives a sentiment score of +2, and the second cases a score of -2. The words contributing +/-1 towards the total score in each case are in brackets:
I really enjoyed (+1) the webinar – it was fun! (+1): Score = +2
I didn’t like (-1) the webinar – because I hate (-1) the speaker: Score = -2
A sentiment score is generated for every respondent in the survey, and saved as a numeric variable. Text entries that were originally blank will get a missing value for the sentiment score and so will be excluded from the base. You can recode them in the usual way, by selecting the variable with your sentiment scores and from the object inspector, selecting Values, and then changing the value from NaN to 0.
Positive and negative words are based on modified lists from here: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon