Sentiment analysis is a way to quantify the feeling or tone of written text. In a survey context, this is a useful technique for gauging the overall attitude towards a brand (or whatever you like). In sentiment analysis, each case receives a numeric sentiment score (on a negative to positive scale). Displayr has two features for this:
- A numeric variable generated by AI from an input text variable. You can determine how the sentiment is coded, and AI will be able to evaluate responses in a "smarter" way.
- A numeric variable that is not generated by AI, and the sentiment score is determined by word matching. This is a less robust analysis than our AI-based sentiment tool, and simply counts positive and negative words in a response. See the Notes below for more information.
This article describes the traditional word-matching method and will walk through how to use raw text data:
To calculate an overall sentiment score:
or to use sentiment scores in further analysis, such as crosstabs:
Requirements
- You will need a Text variable in order to perform text analysis and word cloud creation. Text variables are represented by an A next to the variable in the Data Sources tree:
.
- If you want to use the AI-based sentiment calculation, you will need to have AI enabled. See Opting In and Out of Displayr AI.
Method
- Select your text variable from the Data Sources tree.
- Hover and click + > Numeric Transformations > Sentiment. Note:
- If you have AI enabled, you can modify the Data > Sentiment Prompt in Properties
to further refine the results. See Getting Started with AI Prompting in Displayr for some tips.
- If you have AI disabled, you may see an alert. You can click through the alert, and the numeric variable will be created.
- If you have AI enabled, you can modify the Data > Sentiment Prompt in Properties
- A new variable with a Numeric structure will appear in your Data Sources tree that represents the sentiment score.
You can use this new variable in a variety of ways:
- In cross-tabulations with other questions to see how the sentiment score may vary for different groups within the sample.
- Looking at correlations of sentiment scores with other numeric variables (e.g., use Correlation Matrix).
- You could also turn the numeric sentiment score variable into a nominal variable to divide your sample into those who are positive (score of 1 or higher), neutral (score of 0), and negative (score of -1 or less) on the topic.
OPTIONAL:
In some cases, you might like to “clean” your raw text variable before computing the sentiment scores. This is where the Text Analysis Setup feature can help. In Displayr, it is found in the Report tree or toolbar with + > Advanced Analysis > Text Analysis > Advanced > Setup Text Analysis. This creates an R output on the page where the raw text is processed for spell-checking, stemming, removal of words, replacement of specific words, and combination of words into phrases. To calculate the sentiment scores from the Text Analysis Setup:
- Select the Text Analysis Setup output on the page.
- From Properties
, click Data > Save Variable(s) > Categories (to save all created categories) or First Category (to save only the first category if multiple categories are mentioned). This will create a new categorical variable in the Data Sources tree.
- Select the new categorical variable in the Data Sources tree.
- Hover and click + > Numeric Transformation > Sentiment.
Notes
If you are using the non-AI-based sentiment variable creator, Displayr compares the contents of each text response to English-language dictionaries of positive words and negative words. Positive words get a +1 scoring, negative words get a -1 scoring. The final sentiment score for each response is the sum of these scores. The process also tries to identify when a sentiment has been negated. For example, “not good” would generate a score of -1 instead of a score of 1.
To illustrate, consider these cases from a hypothetical text variable. The first case receives a sentiment score of +2, and the second case a score of -2. The words contributing +/-1 towards the total score in each case are in brackets:
I really enjoyed (+1) the webinar – it was fun! (+1): Score = +2
I didn’t like (-1) the webinar – because I hate (-1) the speaker: Score = -2
A sentiment score is generated for every respondent in the survey and saved as a numeric variable. Text entries that were originally blank will get a missing value for the sentiment score and so will be excluded from the base. You can recode them in the usual way, by selecting the variable with your sentiment scores and from Properties , selecting Data > Values, and then changing the value from NaN to 0.
Background
Positive and negative words are based on modified lists from here: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon
Next
How to Show Sentiment in Word Clouds
Using Customizable Displayr AI Features
How to Calculate Sentiment Scores for Open-Ended Responses Using R