Visualizing Text: Seth Redmore at the 2015 Smart Data Conference

© 2015 Lexalytics Inc. All rights reserved
Visualizing Text
Smart Data Week
Seth Redmore; CMO, Lexalytics, Inc.
@sredmore

Agenda
 The Word Cloud
 Vectors to Visualize
 Ways to group/count
 Manipulating the words (stemming/lemmatization/etc)
 Line/Bubble/Pie
 Treemaps
 Heatmaps
 Clusters
 Graphs
2

The Word Cloud
3

Which word is gone?
4

How about now?
• stem 86
• word 53
• algorithm 49
• rule 36
• suffix 27
• strip 23
• approach 21
• form 21
• language 20
• edit 20
• example 18
• root 18
• apply 14
• search 13
• inflect 12
• english 10
• stem 86
• word 53
• algorithm 49
• rule 36
• strip 23
• approach 21
• form 21
• language 20
• edit 20
• example 18
• root 18
• apply 14
• search 13
• inflect 12
• English 10
• part 10
5

Visualization vectors
Content Derived Associated Metadata
• Stemmed Words/Words/Phrases
• Part-of-Speech
• Extracted Features
– Entities
– Themes
– Topics
– Intentions
• Sentiment/Emotions
• Language
• Geography
• Time
• Publication/Author/@handle
• Socioeconomic
• Social associations
6

Ways to group or count
• Weighting Factors
– Counts
– “Importance”
• Similarity
• Co-occurrence
– Categories
– Other words
7

Pies (one axis)
Positive: 28.65%
Negative: 9.16%
Neutral: 62.20%
For any more than 3 data points pie charts become increasingly hard to read.
If you have 3 or fewer data points, why not just use a table?
8
28.65%
9.16%
62.20%

What is the “true” Sentiment?
-0.1 to +0.1 is neutral-0.2 to +0.2 is neutral
Positive: 28.65%
Negative: 9.16%
Neutral: 62.20%
Positive: 29.77%
Negative: 9.99%
Neutral: 60.24%
9
28.65%
9.16%
62.20%
29.77%
9.99%
60.24%

Lines (2 axes)
10

Bars
11

Bubbles (4 axes)
Courtesy of Provalis Research
12

Stemmed Words vs. Words
vs. Word Phrases vs. Relationships
• I was greatly satisfied with my dinner.
• Greatly satisfied
• Greatly
• Great
• I hate the cracked screen on my phone.
• Cracked screen
• Crack
Satisfied(x1.5)  dinner
Cracked Screen phone
13

LemmatizationStemming
Walking Walk
Better Better
I am meeting him tomorrow
Meeting Meet
In our last meeting, we…
Meeting  Meet
Walking Walk
Better Good
I am meeting him tomorrow
Meeting  Meet
In our last meeting, we…
Meeting  Meeting
Stemming vs. Lemmatization
Examples from Wikipedia
14

Top themes from Samsung Galaxy® Announcement
Themes are contextually scored noun-phrases.
15

Top themes + relative occurrence
16

Plus Sentiment
17

+Time
18

+Sentiment
19

+Gender (too much!)
20

Gender
Theme
Sentiment
21
Important to consider how you
can get the structured data in
there with the unstructured data.

Word Cloud
22

Treemap
23

Treemap Comparison
24

Usenet Treemap
Treemaps are good for data that has hierarchy
25

Force-directed Graphs
Courtesy of Bottlenose
http://www.d3noob.org/2013/03/d3js-force-directed-graph-examples.html
26

Clustering
Courtesy of Quid
27

Clustering Zoom
Courtesy of Quid

Heatmaps
Courtesy of Provalis Research
29

CodeNo-Code
• Datawrapper
– Built for news orgs, better with
structured data
• Charted
– Input CSV or google spreadsheet
• Tableau Public
• Google Charts
• D3
– Hugely powerful, many relevant
chart types for text
– https://github.com/mbostock/d3/wiki/
Gallery
• R
– Full blown stats + visualization
Open Source/Free Tools and Toolsets
30

Full Analytics Systems (with content)Graphing/Charting
• Tableau
• Jreport
• Domo
• Qlik
• Tibco Spotfire
• Wordstat/Simstat
• SAS
• SPSS
Many of them. We work with lots of
them, so, I can’t list them all here.
Commercial Toolkits
31

Summary
• Don’t use pie charts, use tables instead.
• Don’t use word clouds if you can avoid them.
• Really don’t use word clouds for any sort of comparison over time.
• If you’re going to use word clouds
– use intelligent colors
– use them either as a user-interface
– or use them when you’ve already done a bunch of filtering
• Many other chart types have the visual appeal of word clouds while providing more information.
– Time-series charts
– Treemaps
– Force Directed Graphs
– Clusters
– Heatmaps
32
And check this out…
http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve
_ever_seen?language=en

Visualizing Text: Seth Redmore at the 2015 Smart Data Conference

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Visualizing Text: Seth Redmore at the 2015 Smart Data Conference

Ähnlich wie Visualizing Text: Seth Redmore at the 2015 Smart Data Conference (20)

Mehr von sredmore

Mehr von sredmore (9)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Visualizing Text: Seth Redmore at the 2015 Smart Data Conference

Hinweis der Redaktion