Researchers have long known that the words of a text have always contained more information than on the surface. As such, texts have been studied for subtexts and other latent or hidden information. One approach has involved the machine-enabled analysis of human sentiment, usually mapped out on a positive-negative polarity. NVivo 11 Plus (a qualitative research tool released in late 2015) enables the automated sentiment analysis of texts (coded research, formal articles, text corpora, Tweetstream datasets, Facebook wall posts, websites, and other sources) based on four categories: very positive, moderately positive, moderately negative, and very negative. The tool feature compares the target text set against a sentiment dictionary and enables coding at different units of analysis: sentence, paragraph, or cell. Further, the sentiment capability extracts the coded text into respective text sets which may be further analyzed using text frequency counts, text searches, automated theme and sub-theme extractions (topic modeling), and data visualizations.
1. Sentiment Analysis with
NVivo 11 Plus
Summer Institute on Distance Learning and Instructional Technology (SIDLIT 2016)
August 4 - 5, 2016
2. Overview
⢠Researchers have long known that the words of a text have always contained
more information than on the surface. As such, texts have been studied for
subtexts and other latent or hidden information. One approach has involved the
machine-enabled analysis of human sentiment, usually mapped out on a
positive-negative polarity. NVivo 11 Plus (a qualitative research tool released in
late 2015) enables the automated sentiment analysis of texts (coded research,
formal articles, text corpora, Tweetstream datasets, Facebook wall posts,
websites, and other sources) based on four categories: very positive,
moderately positive, moderately negative, and very negative. The tool feature
compares the target text set against a sentiment dictionary and enables coding
at different units of analysis: sentence, paragraph, or cell. Further, the
sentiment capability extracts the coded text into respective text sets which
may be further analyzed using text frequency counts, text searches, automated
theme and sub-theme extractions (topic modeling), and data visualizations.
2
4. Sentiment and its Public / Private Expression
⢠Sentiment may be ephemeral in some cases but may harden into
a stance (and an orientation and then a disposition) depending on
the strength of the sentiment and whether the sentiment is
reinforced or contradicted (by those in a social network and the
larger community)
⢠Expression of sentiment may be reinforcing (strengthening that sentiment)
or cathartic (dissipating that sentiment)
⢠Expression in particular venues may have particular effects
⢠Expressed sentiment may affect recipients (of the messages)
differently based on their receptivity / susceptibility to the
message
4
5. Sentiment and Action
⢠An assumed relationship between sentiment (+ or -) and behavior
/ action (in the aggregate)
⢠Not a simple cause-and-effect
⢠Not simple predictivity
⢠Positive and negative sentiments exist; both can inspire to action
but just different types of action
⢠Positive sentiment is not always desirable; negative sentiment is not always
undesirable
⢠Positive view may lead to complacency on an issue about which one should
not be complacent
5
6. Sentiment and Action (cont.)
⢠High emotional intensity, sympathy, and anger as sparks to
(individual or mass) kinetic action (sometimes unheeding,
sometimes not formally considered)
⢠Communications on the Social Web often âcalls to actionâ
⢠Fund-raising
⢠Boycotting
⢠Taking part in events
⢠Voting
⢠Taking on or maintaining a certain attitude
⢠Taking precautions
⢠Co-messaging, and others
6
7. Reasons for Measuring Sentiment
⢠Public sentiment metrics (from sentiment analysis or opinion
mining) for indicators of success / failure (and degrees in
between) for media professionals, publicists, and advertisers
⢠Public sentiment metrics as early indicators of potential individual
or mass action
⢠Predictivity
⢠Public sentiment metrics as research tool to surface latent
information
7
8. Sentiment and Social Media / Opinion Data
⢠All expressions on social media may feel ephemeral and invisible,
but are actually permanent and highly visible and findable
⢠In social media, sentiment is studied to understand
⢠Strategic messaging and trends moving through social networks [through
friend of a friend (FOAF) networks and word of mouth (WOM)]
⢠Peopleâs reputations and how they are trending
⢠Evolving people-related events and various âpotential futuresâ
8
9. Identification of Language-Based Markers
⢠Idea is to find language-based indicators (markers) that may serve
as shorthand for particular insights about the state of the world
⢠Classic: What is âa|bâ? or What is the state of âaâ given the observation of
âbâ?
⢠Next step out is how these observations of the world may be used
to inform decision-making and actions
9
13. Computational Sentiment Analysis
⢠Conceptualized as a positive-negative polarity
⢠Binary conceptualization as positive, negative, or neutral
⢠Continuum conceptualization as degrees of sentiment
⢠In NVivo 11 Plus: Classifications of text as
⢠Very negative, moderately negative, moderately positive, very positive (and
âneutralâ implied by non-inclusion in the coded sentiment set)
⢠Understandings of general tendencies in a text set
⢠Access to the autocoded text set for each of the categories
⢠Understandings of granular features of the extracted text sets
⢠Spinoff research from extracted text sets possible, such as text
counts, word searches, and others
13
17. Various Methods
⢠Pre-coding a word set from a target language -> comparing text
sets against that sentiment set or sentiment dictionary
⢠Usually focused around semantic-bearing terms (and not so much function
or syntax terms)
⢠Using a customized sentiment dictionary for specialized text sets
(such as Tweets or posts or microblogging messages on social
media)
⢠Translating a different target language to another target language
and then using that target languageâs sentiment dictionary to code
text
17
18. Various Methods (cont.)
⢠More sophisticated consideration of negation, irony, humor,
sarcasm, and longer phrases (n-gram sequences, thought units vs.
single words) for nuanced sentiment labeling
⢠Manual labeling with XML tagging and running queries based on the
manual labeling
⢠Going with bag-of-words vs. structure-preserving sentiment
approaches
18
19. NVivo 11 Plus
⢠A qualitative and mixed methods data analysis tool
⢠Enables the curation of multimedia data in an unstructured / semi-
structured dataset
⢠Enables manual coding of multimedia data
⢠Enables the running of data queries against text versions of all data in a
project
⢠Enables autocoding of text for sentiment, theme and sub-theme extraction,
and unique human coding (âautocoding by existing patternâ)
⢠Enables drawing of various types of data visualizations related to the data
handled: word clouds, word trees, treemaps, dendrograms, cluster
diagrams (2D and 3D), sociograms, geographical maps, and others
19
20. A Walk-through of the Sentiment Analysis Tool
Use
⢠Collection of target texts
⢠Data pre-processing or data cleaning
⢠Ingestion into an NVivo 11 Plus project
⢠Single or combined text corpus (different results depending on treatment of
the text)
⢠Preferable to have both versions, single texts and a combined corpus of
those texts, for different types of questions and different types of
processing
20
21. A Walk-through of the Sentiment Analysis Tool
Use (cont.)
⢠May code at the level of sentences, paragraphs, or cells (level of
granularity), depending also in part on how the textual data is
structured (Tweets are not sentences and are coded as cells in the
extracted data tables, for example; unpunctuated sentences will not be
read as sentences by the software tool, etc.); other sentiment analysis
approaches code at various levels of n-grams
⢠Documents coded at sentence level will result in more codes than those coded
in paragraph level because of the smaller granularity
⢠Documents coded at paragraph level will result in coarser coding
⢠Documents coded at cell level may only be applied to table data (which is how
microblogging and social network post data is collected and ingested; also,
online survey data may often be output as table data)
21
22. A Walk-through of the Sentiment Analysis Tool
Use (cont.)
⢠Autocoding by sentiment classifier
⢠Coding of the text into four categories: very negative, moderately
negative, moderately positive, and very positive; dividable into negative or
positive categories (and neutral, which is left out)
⢠Core words may appear in all four sentiment categories (and even
in the neutral category) but usually at differing frequencies and
sometimes with different word senses
⢠Words are not coded in any sort of mutually exclusive way, so this can
capture some of the complexity in the text (remember that the coding is at
different levels: sentences, paragraphs, or cells)
22
23. A Walk-through of the Sentiment Analysis Tool
Use (cont.)
⢠Data visualizations from the coded data outcomes: intensity
matrices, bar charts, tree maps, and sunbursts
⢠May recode or un-code text from the labeled sentiment text in the
respective nodes
23
24. A Walk-through of the Sentiment Analysis Tool
Use (cont.)
⢠Analysis of the respective autocoded text sets
⢠Machine-enhanced approaches:
⢠Text frequency counts, text searches, matrix coding queries (as data queries);
⢠Theme and sub-theme extraction, sentiment analysis of extracted sentiment
subsets (as autocoding);
⢠Exploration (as data visualizations), and others
⢠Human-enhanced approaches: Manual analysis through âclose readingâ
(vs. machine-based âdistant readingâ) of the labeled texts
⢠Cross-comparisons
⢠External validations
24
25. Some Sentiment Tool Capabilities
⢠Comparison of documents and text corpora against a built-in pre-coded
sentiment dictionary
⢠Inherency of intrinsic attractiveness (positive valence) or aversiveness (negative
valence) embodied in language
⢠Dictionary words weighted based on degree and direction of sentiment
⢠Apparently focused on single words (unigrams) only
⢠Other more complex sentiment classifiers built on bigrams and some trigrams
⢠Not able to consider double-negatives (e.g. ânot unheard ofâ)
⢠No confidence measure: p(y|x) or the âprobability of y given xâ where
y is the sentiment classification and x is the input sentence, paragraph,
or cell text
⢠An inferred confidence based on human oversight of the autocoded sentiments
25
26. Some Sentiment Tool Capabilities (cont.)
⢠Can set base content languages to one of the following: Chinese
(simplified), UK English, US English, French, German, Japanese,
Portuguese, and Spanish
⢠Sentiment analyses in other languages may be based on translations of other
languages to English and a base of sentiment off of the English dictionary, or
it may be based off of the native languages (but the first is more likely and
more common in the field).
⢠Interface language is separate from the base content language.
⢠NVivo 11 Plus projects may include any range of languages expressible in
Unicode (the char set UTF-8), but only the base one is used for various text-
based analytics and to automated analytics (like sentiment); translations of
non-base language words will need to be done ahead of time in order to
ensure that all languagesâ sentiments are analyzed.
26
27. Some Sentiment Tool Capabilities (cont.)
⢠Not finer points of humor, sarcasm, idioms, slang, or irony; no
accommodations for social media-speak (#hashtags, FOMO / âfear of
missing out,â #TBT / âthrowback Thursday,â etc.)
⢠Also not the nuances of polysemy (multi-meaninged words), denotative
vs. connotative meanings (and vice versa), cultural references, and
word-use context
⢠Quantitative counts of sentiment in four categories (coded to nodes);
qualitative information of text coded to the respective four categories
⢠Extracted text sets available for further analyses
⢠Need to assess the actual coded text sets
⢠Ability to manually uncode and recode textual data
27
28. Some Sentiment Tool Capabilities (cont.)
⢠Can treat sentiment coding as a binary (negative or positive) or as
a four-category set (very negative, moderately negative,
moderately positive, and very positive)
⢠Inability to see or modify pre-coded sentiment dictionary against
which a text set is compared (currently)
⢠Also inability to create a customized dictionary for sentiment analysis at
this point
⢠Not treating the text sets in a structured sequential way but more
bag-of-words (without the original order)
28
29. Some Sources of Texts and Text Sets
Formal
⢠Processed data
⢠Edited articles and books
⢠Human and machine-created codes
⢠Raw data
⢠Data tables
⢠Survey data
Informal
⢠Social media platforms as sources of
opinion-rich data
⢠Tweetstream datasets
⢠Facebook wall posts
⢠Crowd-sourced encyclopedia articles
(from Wikipedia)
⢠Websites
29
47. If too many âhttps,â ⌠a work-around
⢠Many social media data captures will result in a lot of âhttpâ references
because the site refers to many other Web pages
⢠Automated theme extraction will result in one or two high-level
categories, with one of them being âhttpâ
⢠This masks what the actual themes areâŚso itâs important to clean the
data of âhttpâ and output a different text set for theme extraction.
⢠At this point, there is no direct way to change up the level of theme
extraction (to enable an automated bypass of âhttpâ at the top level and
to go right to the more substantive contents. (Please see next three
slides.)
47
69. Post-Sentiment Capture Analytics (with
Related Data Visualizations)
⢠Analysis of the subsetted data
⢠Text frequency count
⢠Text search
⢠Matrix coding query
⢠More sentiment analysis
⢠Theme and sub-theme extraction
⢠Word relatedness clustering
69
75. For Consideration
⢠What informs whether you have positive or negative sentiment
about something?
⢠Is this based on your values? Your expectations of the world? Your culture?
Your upbringing?
⢠Is this based on experiences (whether pain or pleasure)?
⢠Is this a process that is a fully conscious one or one that may occur in a
subconscious or even unconscious way?
⢠Once you have formed a sentiment about something (or even a
âpre-sentimentâ), how committed are you to it?
⢠How hard it is for you to change your mind? Why?
75
76. For Consideration (cont.)
⢠Between positive and negative sentiment, which one is more likely
to lead you to take action? What sort of action(s)?
⢠What emotions lead to a sensation of pleasure? Why?
⢠What emotions lead to a sensation of displeasure? Why?
⢠Or is it a matter of intensity of emotion that moves you to action?
Or surprise? (Please share a direct experienced story or two.)
⢠On the converse, what sort of sentiment tends to make you
passive? To dissuade you from action? Why?
76
77. For Consideration (cont.)
⢠When you express sentiment on social media (share), does it tend
to have a reinforcement effect (make you more committed to
your sentiment) or a cathartic effect (make you less committed to
your sentiment)? Does public expression strengthen the
sentiment or weaken it? Or neither? Why?
⢠Does reinforcement, catharsis, or no-effect occur from expression of
sentiment on social media based on the particular issue and context?
⢠Is there a difference (in terms of action taken) if you express the sentiment
to a friend face-to-face (privately)? To a family member? Online? To a
larger community? To strangers? Why?
77
78. Emotions
⢠Study of sentiment evolved to the study of emotions, which are
⢠higher dimensionalâŚ
⢠somewhat linked to personalityâŚ
⢠linked to various psychological modelsâŚ
⢠measured using various psychometricsâŚand
⢠observable in various ways (in lab settings)
78
79. Robert Plutchikâs
Wheel of Emotions
Eight Primary Emotions:
⢠Anger
⢠Fear
⢠Sadness
⢠Disgust
⢠Surprise
⢠Anticipation
⢠Trust
⢠Joy
79
By Machine Elf 1735
80. For Consideration (cont.)
⢠How attuned are you to your emotions?
⢠Emotion as motivator: What sort of emotion drives you to action? How can
you manipulate that emotion in order to motivate yourself to desirable
action (and to demotivate yourself from undesirable action)?
⢠Emotion as de-motivator: What sort of emotion drives you to passivity and
inertia (even for good behaviors)? How can you motivate yourself to get
past such de-motivating emotions?
80
81. Some Exercise Ideas
⢠Think of text sets that you see often. Identify one set. If you
were to guess (hypothesize), how would this text set rank in terms
of sentiment in the four categories: very negative, moderately
negative, moderately positive, and very positive (or more simply
in a polarity of negative-positive).
⢠Why would you assume this particular distribution? (If you have chance, run
a sentiment analysis, and see what you get.)
⢠Much of the worldâs information is spoken and shared aloud.
Identify a set of spoken data. Transcode it into written text.
What sentiment distributions do you expect to see? Why? Run the
sentiment analysis. What do you find?
81
82. Research Applications? Problem-solving
Applications?
⢠What are some practical research applications of sentiment
analysis in your respective research domain(s)?
⢠What are some practical problem-solving applications of sentiment
analysis in your respective research domain(s)?
82
83. Hidden States
⢠Given these sentiment-based observations from natural language,
what is / are the hidden state(s)?
⢠Hidden state-of-the-person?
⢠Hidden state of people or groups or populations (collectively)?
⢠Hidden state of the issue? The context?
83
84. Assertability?
Enablements / Affordances
⢠Standalone assertions (descriptive
data): This text set uses language that
falls on this particular sentiment
distribution (whether polarity or
category).
⢠The respective text sets (in each
sentiment category) have the following
topical focuses.
⢠Based on the expressed sentiments,
the following actions may be predicted
(with a certain level of confidence).
Limitations
⢠There are aspects of the text sets
that are not addressed based on
limits of the sentiment analysis tool.
⢠The text sets only contain a certain
amount of information. The sets are
not an N of all.
⢠The sentiment coding is / was not
overseen by humans for correction
and re-coding.
84
85. Assertability? (cont.)
Enablements / Affordances
⢠Remote profiling (inferential): This
individual or group tends to go
negative (and / ) or positive on these
particular topics.
⢠Based on the expressed
communications, this person may be
assumed to be of a certain
psychological make-up.
⢠Based on the expressed
communications, this person may
take the following actions.
Limitations
⢠The sentiment analysis only
addresses sentiment and not the
wider research into emotion and
valence.
⢠Sentiment analysis is often studied in
isolation, without the benefit of
other information streams.
85
86. Assertability? (cont.)
Enablements / Affordances
⢠Comparative assertions (analytical):
These two or more text sets (text
corpora) differ in terms of sentiment
in these waysâŚand around these
particular topics / concepts. Here
are some reasons why.
⢠Based on these differences, the
following observations may be
made.
Limitations
⢠The sentiment analysis is only run
over textual data, not image-, audio-
, video- or other such data. For
multimedia, there should be
informational and sentiment
equivalencies of the multimedia data
in text form.
86
87. Ways to Strengthen Sentiment Analysis
⢠Select the text sets strategically.
⢠Capture a sufficient amount of text examples.
⢠Pre-process the data effectively, without losing information.
⢠Do not over-assert beyond where the data will go.
⢠Bring in contextual and cultural insights to add color to the data.
87
88. Conclusion and Contact
⢠Dr. Shalin Hai-Jew
⢠iTAC, Kansas State University
⢠212 Hale / Farrell Library
⢠shalin@k-state.edu
⢠785-532-5262
⢠All the collected datasets and visualizations were created by the
presenter.
⢠The visualizations were created inside NVivo 11 Plus.
⢠The presenter has no tie to QSR International.
88