Sentiment analysis is a way to quantify the feeling or tone of written text. In a survey context, this is a useful technique for gauging the overall attitude towards a brand (or whatever you like). In sentiment analysis, each case receives a numeric sentiment score (on a negative to positive scale).
In Displayr, sentiment scores are calculated by:
- Counting the number positive and negative terms in a response. These terms are identified from internal positive and negative dictionaries.
- Each positive terms is assigned the value of +1 and negative terms -1.
- The sentiment score is the NET sum of these values.
While the sentiment score is useful to assess the overall tone of the text, it may be useful to get information on how this score was calculated. For example, if the sentiment score is +3, does that mean that the response was completely positive, meaning that it contained 3 positive terms and no negative terms, or was there a mixture of positive or negative terms? For example, there may have been 4 positive terms and 1 negative term.
This article explains how to get counts of positive and negative terms. For example:
Requirements
A document with one or more text responses
Please note these steps require a Displayr license.
Method
In this example, we will be creating our new R variables from a variable called "text" containing Donald Trump tweets from 2016.
- Click the plus sign below the variable text
- From the Insert Variable(s) menu, select Custom Code > Multiple R Variables > Numeric to create a Numeric - Multi variable set or select Custom Code > Multiple R Variables > Text to create a Text - Multi variable set. The example below creates Numeric variables.
- Type the number of variables you want to create in the How Many Variables Do You Want to Create? box. In this example, we will create 3 new variables.
- Click OK.
The results are as follows:
Each of the variables in this variable set shares the same R template code, that is used to create empty variables.
The R template code looks like the below under Data > R Code - Data > Edit Code. It creates an empty data.frame (table) with as many rows as cases in your data set and as many columns as specified in step 3. You can then add in your custom code and modify the empty table, new.data, as needed to get your desired result:
Or you can replace the entire template code completely if you are familiar with R and know that your final output is the required number of rows (cases) and columns (variables).
We will replace the template with our own code from another document. For example:
library(flipTextAnalysis)
AsSentimentMatrix(Colas.sav$Variables$q6,
remove.stopwords = FALSE,
operations = '',
check.simple.suffixes = TRUE,
simple.suffixes = c("s", "es", "ed", "d", "ing"),
pos.words = get("ftaPositiveWords"),
neg.words = get("ftaNegativeWords"),
blanks.as.missing = TRUE) - The next step is to update the template code with R code of your own. In the Data Sources tree, click on Variable 1 in the New R Variable Set.
The current code will be found under Data > R Code - Data > Edit Code.
- Delete the code in the R code in the open window and replace it with the following code:
library(flipTextAnalysis)
AsSentimentMatrix(Colas.sav$Variables$q6,
remove.stopwords = FALSE,
operations = '',
check.simple.suffixes = TRUE,
simple.suffixes = c("s", "es", "ed", "d", "ing"),
pos.words = get("ftaPositiveWords"),
neg.words = get("ftaNegativeWords"),
blanks.as.missing = TRUE)
- Edit the second line to point to the data file and question containing your text data.
- Click Calculate.
The results are as follows. The variable names were updated with the column names we specified in the R code. The variable set name was renamed to Trump Tweets.
At this point, you may want to change properties of the variable set, like Structure, etc, as needed. - Select Table > Raw Data > Variable(s).
- Drag Trump Tweets to the Variables box
At this point you may want to do some further analysis on three new variables in the table. For example it would be of interest to get the average number of negative words Trump included in his Tweets. - Right-click on the variable set name and select Split
- Drag the variables Positive words, Negative words and Sentiment score onto the page.