SlideShare ist ein Scribd-Unternehmen logo
1 von 33
© 2015 Lexalytics Inc. All rights reserved
Visualizing Text
Smart Data Week
Seth Redmore; CMO, Lexalytics, Inc.
@sredmore
© 2015 Lexalytics Inc. All rights reserved
Agenda
 The Word Cloud
 Vectors to Visualize
 Ways to group/count
 Manipulating the words (stemming/lemmatization/etc)
 Line/Bubble/Pie
 Treemaps
 Heatmaps
 Clusters
 Graphs
2
© 2015 Lexalytics Inc. All rights reserved
The Word Cloud
3
© 2015 Lexalytics Inc. All rights reserved
Which word is gone?
4
© 2015 Lexalytics Inc. All rights reserved
How about now?
• stem 86
• word 53
• algorithm 49
• rule 36
• suffix 27
• strip 23
• approach 21
• form 21
• language 20
• edit 20
• example 18
• root 18
• apply 14
• search 13
• inflect 12
• english 10
• stem 86
• word 53
• algorithm 49
• rule 36
• strip 23
• approach 21
• form 21
• language 20
• edit 20
• example 18
• root 18
• apply 14
• search 13
• inflect 12
• English 10
• part 10
5
© 2015 Lexalytics Inc. All rights reserved
Visualization vectors
Content Derived Associated Metadata
• Stemmed Words/Words/Phrases
• Part-of-Speech
• Extracted Features
– Entities
– Themes
– Topics
– Intentions
• Sentiment/Emotions
• Language
• Geography
• Time
• Publication/Author/@handle
• Socioeconomic
• Social associations
6
© 2015 Lexalytics Inc. All rights reserved
Ways to group or count
• Weighting Factors
– Counts
– “Importance”
• Similarity
• Co-occurrence
– Categories
– Other words
7
© 2015 Lexalytics Inc. All rights reserved
Pies (one axis)
Positive: 28.65%
Negative: 9.16%
Neutral: 62.20%
For any more than 3 data points pie charts become increasingly hard to read.
If you have 3 or fewer data points, why not just use a table?
8
28.65%
9.16%
62.20%
© 2015 Lexalytics Inc. All rights reserved
What is the “true” Sentiment?
-0.1 to +0.1 is neutral-0.2 to +0.2 is neutral
Positive: 28.65%
Negative: 9.16%
Neutral: 62.20%
Positive: 29.77%
Negative: 9.99%
Neutral: 60.24%
9
28.65%
9.16%
62.20%
29.77%
9.99%
60.24%
© 2015 Lexalytics Inc. All rights reserved
Lines (2 axes)
10
© 2015 Lexalytics Inc. All rights reserved
Bars
11
© 2015 Lexalytics Inc. All rights reserved
Bubbles (4 axes)
Courtesy of Provalis Research
12
© 2015 Lexalytics Inc. All rights reserved
Stemmed Words vs. Words
vs. Word Phrases vs. Relationships
• I was greatly satisfied with my dinner.
• Greatly satisfied
• Greatly
• Great
• I hate the cracked screen on my phone.
• Cracked screen
• Crack
Satisfied(x1.5)  dinner
Cracked Screen phone
13
© 2015 Lexalytics Inc. All rights reserved
LemmatizationStemming
Walking Walk
Better Better
I am meeting him tomorrow
Meeting Meet
In our last meeting, we…
Meeting  Meet
Walking Walk
Better Good
I am meeting him tomorrow
Meeting  Meet
In our last meeting, we…
Meeting  Meeting
Stemming vs. Lemmatization
Examples from Wikipedia
14
© 2015 Lexalytics Inc. All rights reserved
Top themes from Samsung Galaxy® Announcement
Themes are contextually scored noun-phrases.
15
© 2015 Lexalytics Inc. All rights reserved
Top themes + relative occurrence
16
© 2015 Lexalytics Inc. All rights reserved
Plus Sentiment
17
© 2015 Lexalytics Inc. All rights reserved
+Time
18
© 2015 Lexalytics Inc. All rights reserved
+Sentiment
19
© 2015 Lexalytics Inc. All rights reserved
+Gender (too much!)
20
© 2015 Lexalytics Inc. All rights reserved
Gender
Theme
Sentiment
21
Important to consider how you
can get the structured data in
there with the unstructured data.
© 2015 Lexalytics Inc. All rights reserved
Word Cloud
22
© 2015 Lexalytics Inc. All rights reserved
Treemap
23
© 2015 Lexalytics Inc. All rights reserved
Treemap Comparison
24
© 2015 Lexalytics Inc. All rights reserved
Usenet Treemap
Treemaps are good for data that has hierarchy
25
© 2015 Lexalytics Inc. All rights reserved
Force-directed Graphs
Courtesy of Bottlenose
http://www.d3noob.org/2013/03/d3js-force-directed-graph-examples.html
26
© 2015 Lexalytics Inc. All rights reserved
Clustering
Courtesy of Quid
27
© 2015 Lexalytics Inc. All rights reserved
Clustering Zoom
Courtesy of Quid
© 2015 Lexalytics Inc. All rights reserved
Heatmaps
Courtesy of Provalis Research
29
© 2015 Lexalytics Inc. All rights reserved
CodeNo-Code
• Datawrapper
– Built for news orgs, better with
structured data
• Charted
– Input CSV or google spreadsheet
• Tableau Public
• Google Charts
• D3
– Hugely powerful, many relevant
chart types for text
– https://github.com/mbostock/d3/wiki/
Gallery
• R
– Full blown stats + visualization
Open Source/Free Tools and Toolsets
30
© 2015 Lexalytics Inc. All rights reserved
Full Analytics Systems (with content)Graphing/Charting
• Tableau
• Jreport
• Domo
• Qlik
• Tibco Spotfire
• Wordstat/Simstat
• SAS
• SPSS
Many of them. We work with lots of
them, so, I can’t list them all here.
Commercial Toolkits
31
© 2015 Lexalytics Inc. All rights reserved
Summary
• Don’t use pie charts, use tables instead.
• Don’t use word clouds if you can avoid them.
• Really don’t use word clouds for any sort of comparison over time.
• If you’re going to use word clouds
– use intelligent colors
– use them either as a user-interface
– or use them when you’ve already done a bunch of filtering
• Many other chart types have the visual appeal of word clouds while providing more information.
– Time-series charts
– Treemaps
– Force Directed Graphs
– Clusters
– Heatmaps
32
And check this out…
http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve
_ever_seen?language=en
© 2015 Lexalytics Inc. All rights reserved

Weitere ähnliche Inhalte

Ähnlich wie Visualizing Text: Seth Redmore at the 2015 Smart Data Conference

Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataFang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataPyData
 
The Disciplined Agile Enterprise: Harmonizing Agile and Lean
The Disciplined Agile Enterprise: Harmonizing Agile and LeanThe Disciplined Agile Enterprise: Harmonizing Agile and Lean
The Disciplined Agile Enterprise: Harmonizing Agile and LeanBosnia Agile
 
Webinar - Harness the Power of Data with Tableau - 2016-02-18
Webinar - Harness the Power of Data with Tableau - 2016-02-18Webinar - Harness the Power of Data with Tableau - 2016-02-18
Webinar - Harness the Power of Data with Tableau - 2016-02-18TechSoup
 
When to Consider Semantic Technology for Your Enterprise
When to Consider Semantic Technology for Your Enterprise When to Consider Semantic Technology for Your Enterprise
When to Consider Semantic Technology for Your Enterprise Blue Slate Solutions
 
When to Consider Semantic Technology for Your Enterprise
When to Consider Semantic Technology for Your EnterpriseWhen to Consider Semantic Technology for Your Enterprise
When to Consider Semantic Technology for Your EnterpriseBlue Slate Solutions
 
Enterprise-level Transition from SAS to Open-source Programming for the whole...
Enterprise-level Transition from SAS to Open-source Programming for the whole...Enterprise-level Transition from SAS to Open-source Programming for the whole...
Enterprise-level Transition from SAS to Open-source Programming for the whole...Kevin Lee
 
Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)KubeAcademy
 
K8scale update-kubecon2015
K8scale update-kubecon2015K8scale update-kubecon2015
K8scale update-kubecon2015Bob Wise
 
Tips for Tableau Beginners: Dashboard Design with Tableau Desktop
Tips for Tableau Beginners: Dashboard Design with Tableau DesktopTips for Tableau Beginners: Dashboard Design with Tableau Desktop
Tips for Tableau Beginners: Dashboard Design with Tableau DesktopSenturus
 
De-risking data integration projects
De-risking data integration projectsDe-risking data integration projects
De-risking data integration projectsExperian Data Quality
 
Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicRadovan Baćović
 
Ten10 Seminar: Test Automation, Tooling and the Future (slides)
Ten10 Seminar: Test Automation, Tooling and the Future (slides)Ten10 Seminar: Test Automation, Tooling and the Future (slides)
Ten10 Seminar: Test Automation, Tooling and the Future (slides)Ten10
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationEmbarcadero Technologies
 
Disciplined Agile Delivery: Extending Scrum to the Enterprise
Disciplined Agile Delivery: Extending Scrum to the EnterpriseDisciplined Agile Delivery: Extending Scrum to the Enterprise
Disciplined Agile Delivery: Extending Scrum to the EnterpriseTechWell
 
Accelerating SDLC for Large Public Sector Enterprise Applications
Accelerating SDLC for Large Public Sector Enterprise ApplicationsAccelerating SDLC for Large Public Sector Enterprise Applications
Accelerating SDLC for Large Public Sector Enterprise ApplicationsSplunk
 
Fast, reliable, secure @ Velocity 2015
Fast, reliable, secure @  Velocity 2015Fast, reliable, secure @  Velocity 2015
Fast, reliable, secure @ Velocity 2015Ariel Tseitlin
 
The LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity ModelThe LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity ModelLima Consulting Group
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLMatt Lord
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Beconfluent
 

Ähnlich wie Visualizing Text: Seth Redmore at the 2015 Smart Data Conference (20)

Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataFang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
 
The Disciplined Agile Enterprise: Harmonizing Agile and Lean
The Disciplined Agile Enterprise: Harmonizing Agile and LeanThe Disciplined Agile Enterprise: Harmonizing Agile and Lean
The Disciplined Agile Enterprise: Harmonizing Agile and Lean
 
Webinar - Harness the Power of Data with Tableau - 2016-02-18
Webinar - Harness the Power of Data with Tableau - 2016-02-18Webinar - Harness the Power of Data with Tableau - 2016-02-18
Webinar - Harness the Power of Data with Tableau - 2016-02-18
 
When to Consider Semantic Technology for Your Enterprise
When to Consider Semantic Technology for Your Enterprise When to Consider Semantic Technology for Your Enterprise
When to Consider Semantic Technology for Your Enterprise
 
When to Consider Semantic Technology for Your Enterprise
When to Consider Semantic Technology for Your EnterpriseWhen to Consider Semantic Technology for Your Enterprise
When to Consider Semantic Technology for Your Enterprise
 
Enterprise-level Transition from SAS to Open-source Programming for the whole...
Enterprise-level Transition from SAS to Open-source Programming for the whole...Enterprise-level Transition from SAS to Open-source Programming for the whole...
Enterprise-level Transition from SAS to Open-source Programming for the whole...
 
Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)Kubernetes Scaling SIG (K8Scale)
Kubernetes Scaling SIG (K8Scale)
 
K8scale update-kubecon2015
K8scale update-kubecon2015K8scale update-kubecon2015
K8scale update-kubecon2015
 
Tips for Tableau Beginners: Dashboard Design with Tableau Desktop
Tips for Tableau Beginners: Dashboard Design with Tableau DesktopTips for Tableau Beginners: Dashboard Design with Tableau Desktop
Tips for Tableau Beginners: Dashboard Design with Tableau Desktop
 
Three pillars of components in the design system
Three pillars of components in the design systemThree pillars of components in the design system
Three pillars of components in the design system
 
De-risking data integration projects
De-risking data integration projectsDe-risking data integration projects
De-risking data integration projects
 
Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovic
 
Ten10 Seminar: Test Automation, Tooling and the Future (slides)
Ten10 Seminar: Test Automation, Tooling and the Future (slides)Ten10 Seminar: Test Automation, Tooling and the Future (slides)
Ten10 Seminar: Test Automation, Tooling and the Future (slides)
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
 
Disciplined Agile Delivery: Extending Scrum to the Enterprise
Disciplined Agile Delivery: Extending Scrum to the EnterpriseDisciplined Agile Delivery: Extending Scrum to the Enterprise
Disciplined Agile Delivery: Extending Scrum to the Enterprise
 
Accelerating SDLC for Large Public Sector Enterprise Applications
Accelerating SDLC for Large Public Sector Enterprise ApplicationsAccelerating SDLC for Large Public Sector Enterprise Applications
Accelerating SDLC for Large Public Sector Enterprise Applications
 
Fast, reliable, secure @ Velocity 2015
Fast, reliable, secure @  Velocity 2015Fast, reliable, secure @  Velocity 2015
Fast, reliable, secure @ Velocity 2015
 
The LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity ModelThe LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity Model
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQL
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Be
 

Mehr von sredmore

It's the Most Important Journey in B.I.
It's the Most Important Journey in B.I.It's the Most Important Journey in B.I.
It's the Most Important Journey in B.I.sredmore
 
Always Remember Occam's Razor
Always Remember Occam's RazorAlways Remember Occam's Razor
Always Remember Occam's Razorsredmore
 
Make Data Analytics UI Intuitive
Make Data Analytics UI IntuitiveMake Data Analytics UI Intuitive
Make Data Analytics UI Intuitivesredmore
 
It's How You Use Data
It's How You Use DataIt's How You Use Data
It's How You Use Datasredmore
 
Data is the Key to AI
Data is the Key to AIData is the Key to AI
Data is the Key to AIsredmore
 
The Key to Good Data Science
The Key to Good Data ScienceThe Key to Good Data Science
The Key to Good Data Sciencesredmore
 
Data Strength
Data StrengthData Strength
Data Strengthsredmore
 
Data Science is About Weeding
Data Science is About WeedingData Science is About Weeding
Data Science is About Weedingsredmore
 
Accuracy sas-redmore-2014-2
Accuracy sas-redmore-2014-2Accuracy sas-redmore-2014-2
Accuracy sas-redmore-2014-2sredmore
 

Mehr von sredmore (9)

It's the Most Important Journey in B.I.
It's the Most Important Journey in B.I.It's the Most Important Journey in B.I.
It's the Most Important Journey in B.I.
 
Always Remember Occam's Razor
Always Remember Occam's RazorAlways Remember Occam's Razor
Always Remember Occam's Razor
 
Make Data Analytics UI Intuitive
Make Data Analytics UI IntuitiveMake Data Analytics UI Intuitive
Make Data Analytics UI Intuitive
 
It's How You Use Data
It's How You Use DataIt's How You Use Data
It's How You Use Data
 
Data is the Key to AI
Data is the Key to AIData is the Key to AI
Data is the Key to AI
 
The Key to Good Data Science
The Key to Good Data ScienceThe Key to Good Data Science
The Key to Good Data Science
 
Data Strength
Data StrengthData Strength
Data Strength
 
Data Science is About Weeding
Data Science is About WeedingData Science is About Weeding
Data Science is About Weeding
 
Accuracy sas-redmore-2014-2
Accuracy sas-redmore-2014-2Accuracy sas-redmore-2014-2
Accuracy sas-redmore-2014-2
 

Kürzlich hochgeladen

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Visualizing Text: Seth Redmore at the 2015 Smart Data Conference

  • 1. © 2015 Lexalytics Inc. All rights reserved Visualizing Text Smart Data Week Seth Redmore; CMO, Lexalytics, Inc. @sredmore
  • 2. © 2015 Lexalytics Inc. All rights reserved Agenda  The Word Cloud  Vectors to Visualize  Ways to group/count  Manipulating the words (stemming/lemmatization/etc)  Line/Bubble/Pie  Treemaps  Heatmaps  Clusters  Graphs 2
  • 3. © 2015 Lexalytics Inc. All rights reserved The Word Cloud 3
  • 4. © 2015 Lexalytics Inc. All rights reserved Which word is gone? 4
  • 5. © 2015 Lexalytics Inc. All rights reserved How about now? • stem 86 • word 53 • algorithm 49 • rule 36 • suffix 27 • strip 23 • approach 21 • form 21 • language 20 • edit 20 • example 18 • root 18 • apply 14 • search 13 • inflect 12 • english 10 • stem 86 • word 53 • algorithm 49 • rule 36 • strip 23 • approach 21 • form 21 • language 20 • edit 20 • example 18 • root 18 • apply 14 • search 13 • inflect 12 • English 10 • part 10 5
  • 6. © 2015 Lexalytics Inc. All rights reserved Visualization vectors Content Derived Associated Metadata • Stemmed Words/Words/Phrases • Part-of-Speech • Extracted Features – Entities – Themes – Topics – Intentions • Sentiment/Emotions • Language • Geography • Time • Publication/Author/@handle • Socioeconomic • Social associations 6
  • 7. © 2015 Lexalytics Inc. All rights reserved Ways to group or count • Weighting Factors – Counts – “Importance” • Similarity • Co-occurrence – Categories – Other words 7
  • 8. © 2015 Lexalytics Inc. All rights reserved Pies (one axis) Positive: 28.65% Negative: 9.16% Neutral: 62.20% For any more than 3 data points pie charts become increasingly hard to read. If you have 3 or fewer data points, why not just use a table? 8 28.65% 9.16% 62.20%
  • 9. © 2015 Lexalytics Inc. All rights reserved What is the “true” Sentiment? -0.1 to +0.1 is neutral-0.2 to +0.2 is neutral Positive: 28.65% Negative: 9.16% Neutral: 62.20% Positive: 29.77% Negative: 9.99% Neutral: 60.24% 9 28.65% 9.16% 62.20% 29.77% 9.99% 60.24%
  • 10. © 2015 Lexalytics Inc. All rights reserved Lines (2 axes) 10
  • 11. © 2015 Lexalytics Inc. All rights reserved Bars 11
  • 12. © 2015 Lexalytics Inc. All rights reserved Bubbles (4 axes) Courtesy of Provalis Research 12
  • 13. © 2015 Lexalytics Inc. All rights reserved Stemmed Words vs. Words vs. Word Phrases vs. Relationships • I was greatly satisfied with my dinner. • Greatly satisfied • Greatly • Great • I hate the cracked screen on my phone. • Cracked screen • Crack Satisfied(x1.5)  dinner Cracked Screen phone 13
  • 14. © 2015 Lexalytics Inc. All rights reserved LemmatizationStemming Walking Walk Better Better I am meeting him tomorrow Meeting Meet In our last meeting, we… Meeting  Meet Walking Walk Better Good I am meeting him tomorrow Meeting  Meet In our last meeting, we… Meeting  Meeting Stemming vs. Lemmatization Examples from Wikipedia 14
  • 15. © 2015 Lexalytics Inc. All rights reserved Top themes from Samsung Galaxy® Announcement Themes are contextually scored noun-phrases. 15
  • 16. © 2015 Lexalytics Inc. All rights reserved Top themes + relative occurrence 16
  • 17. © 2015 Lexalytics Inc. All rights reserved Plus Sentiment 17
  • 18. © 2015 Lexalytics Inc. All rights reserved +Time 18
  • 19. © 2015 Lexalytics Inc. All rights reserved +Sentiment 19
  • 20. © 2015 Lexalytics Inc. All rights reserved +Gender (too much!) 20
  • 21. © 2015 Lexalytics Inc. All rights reserved Gender Theme Sentiment 21 Important to consider how you can get the structured data in there with the unstructured data.
  • 22. © 2015 Lexalytics Inc. All rights reserved Word Cloud 22
  • 23. © 2015 Lexalytics Inc. All rights reserved Treemap 23
  • 24. © 2015 Lexalytics Inc. All rights reserved Treemap Comparison 24
  • 25. © 2015 Lexalytics Inc. All rights reserved Usenet Treemap Treemaps are good for data that has hierarchy 25
  • 26. © 2015 Lexalytics Inc. All rights reserved Force-directed Graphs Courtesy of Bottlenose http://www.d3noob.org/2013/03/d3js-force-directed-graph-examples.html 26
  • 27. © 2015 Lexalytics Inc. All rights reserved Clustering Courtesy of Quid 27
  • 28. © 2015 Lexalytics Inc. All rights reserved Clustering Zoom Courtesy of Quid
  • 29. © 2015 Lexalytics Inc. All rights reserved Heatmaps Courtesy of Provalis Research 29
  • 30. © 2015 Lexalytics Inc. All rights reserved CodeNo-Code • Datawrapper – Built for news orgs, better with structured data • Charted – Input CSV or google spreadsheet • Tableau Public • Google Charts • D3 – Hugely powerful, many relevant chart types for text – https://github.com/mbostock/d3/wiki/ Gallery • R – Full blown stats + visualization Open Source/Free Tools and Toolsets 30
  • 31. © 2015 Lexalytics Inc. All rights reserved Full Analytics Systems (with content)Graphing/Charting • Tableau • Jreport • Domo • Qlik • Tibco Spotfire • Wordstat/Simstat • SAS • SPSS Many of them. We work with lots of them, so, I can’t list them all here. Commercial Toolkits 31
  • 32. © 2015 Lexalytics Inc. All rights reserved Summary • Don’t use pie charts, use tables instead. • Don’t use word clouds if you can avoid them. • Really don’t use word clouds for any sort of comparison over time. • If you’re going to use word clouds – use intelligent colors – use them either as a user-interface – or use them when you’ve already done a bunch of filtering • Many other chart types have the visual appeal of word clouds while providing more information. – Time-series charts – Treemaps – Force Directed Graphs – Clusters – Heatmaps 32 And check this out… http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve _ever_seen?language=en
  • 33. © 2015 Lexalytics Inc. All rights reserved

Hinweis der Redaktion

  1. The (in)famous word cloud. A lot of our customers ask for word clouds. We think that there are much better ways to visualize data, and if you insist on a word cloud, there’s good ways to use them, and poor ways to use them.
  2. Word clouds are packing algorithms. Part of the problem is that in order to pack the words, if there is a change in the word list, the word cloud itself will be completely redrawn. This makes it really hard to compare one word cloud to another, making them difficult for comparative analysis purposes. The answer to the question is on the next slide. Ponder this for like 30 seconds, and then go to the next slide.
  3. The word missing was suffix. What’s the word missing on this slide? See how much easier that was. And we’re giving you actually 3 pieces of information here – the actual number of occurrences, the word itself, and an actual ordered list of the words themselves. The problem is that this list just doesn’t look very sexy. Hold that thought.
  4. Let’s step back for a second. The “content-derived” set are some of the most important things that you extract from unstructured data. On the right are common associated structured data that you will often want to associate with the unstructured data. There are, of course, other things like sales, customer ID, etc. It is important to think about how you’re going to show these two things together.
  5. One piece of information that you can give visually is the “relationships” between things. You can do so spatially, through clustering or graphs. You can do side-by-side bars, and there’s a lot of other ways you can group things together. Text is interesting in that one piece of text may mention concept A and concept B, and another piece of text may mention B and C. There’s a relationship there, but not a complete overlap. This can sometimes be important for analytical purposes. Grouping is an important concept, and is one that we’ll examine more later on in this presentation, particularly when we’re talking about graphing and clustering.
  6. Pie charts are another default visualization. Think about them for a second. Do you really need to show people how to do percentages? They don’t work very well for anything more than 3-4 things. They’re also really easy to lie with – ever see a 3d pie chart where it’s kinda on an angle? That makes it really hard to see. Tables are better. Use tables. They’re not going to take up graphical space, but they’re really easy to read.
  7. Since we’re talking about pie charts, let’s talk about another easy area for manipulation. One needs to ask some stuff about “what are the bounds of neutral” – if you have a narrow bound, then you’ll get more content in positive and negative. We’ve actually found with our software that the results most agree with humans when the bounds are a little lopsided (-.1 for negative, +.15 for positive). Scales and such will differ, but for any sentiment tool that is reporting numbers, it’s important to understand these bounds.
  8. Lines are great. Now you start getting into more information packed onto the graph. This chart is the result of taking 150,000 songs, running them line-by-line through our sentiment engine, and then bucketing them into songs that follow different topologies – for example, “positive-positive” would be a song that in the first half is positive, and then stays positive for the last half. This graph has some flaws on it – namely the sharp dropoff. It gives a misleading bit of information due to a lack of data in the later years. It’s true, but looks misleading.
  9. This is the same information as was on the last graph, but put with side-by-side bars. What is interesting about this graph is that you can see, by percentage, how the positive-positive songs decrease until the 2000’s, then start to increase again, where the converse happens with negative-negative songs. That’s more immediately informative than the previous line chart where you have to dig a little harder to get that information.
  10. Bubble charts allow you to present up to 4 axes of information, two for x&y, a third for the size of the bubble, and a fourth for color of the bubble. I’m a big fan of bubble charts, and you’ll be seeing at least one more of these in the presentation in a real world example. This chart is courtesy of Provalis Research, who make cool statistical and text analytics packages for desktop use. http://www.provalisresearch.com
  11. Let’s talk a little bit about words and relationships. When you stem a word, you’re trying to find the root word. We’ll discuss the difference between stems and lemmas on the next slide. The point of this slide is that if you ignore phrases and just pull out the stemmed forms, you’re missing part of the deal. In the top example, “satisfied” is modified by “greatly”, and should be associated with dinner. If you just stemmed the words down, you’d end up down at “great” which would be cool, but knowing “great” and it’s relationship to “satisfied” and eventually “dinner” is really important. Similar with the bottom example. Cracked screen is a whole thing in and of itself. You can probably infer if you just see “crack” that there was a, well, crack. But you don’t know what was cracked on the thing and you don’t know what thing it was – which is why it’s important to expose “cracked screen” and associate it with “phone.”
  12. Stemming is trying to find the root word without taking the part-of-speech type into account. Lemmatization (also okay if spelled with an “s” – lemmatisation) takes the part-of-speech into account. Meeting can be a verb or a noun. If a verb, the root form is “meet” – if a noun, then the root form is actually “meeting.” You can generally get away with just stems, but lemmas provide a richer experience.
  13. On the next few slides, we’re going to work through content that was gathered around the time of the Samsung Galaxy S5 announcement. We’re going to focus on themes, which is a way to extract noun phrases that are contextually important. This is the simplest possible “word cloud” of these themes. No size, no color, just a list of the terms.
  14. Here we have a word cloud, where size is dependent on occurrence, but color isn’t used.
  15. And now we add color for sentiment. You can immediately see that “Gangnam Style” and “Android Source Code” are negative themes. You should be wondering why at this point.
  16. Here’s the exact same set of themes, but arranged in a bubble chart. Now you have a timeline, so you can actually make some inferences about when things happened and what themes co-occurred in time. For example, the Android Source Code thing happened later, where the Gangnam style thing happened around the time of the launch.
  17. Now we add back in the sentiment. Here you see all of the information that was in the word cloud, but arranged in time so that you can see stuff that was associated with the launch, vs. content that occurred later in time. The Gangnam Style negativity happened around launch time. Digging into the content (not shown) you just see that the song was overplayed and people were like “really?” The Android Source Code bit happens later on, and is associated with the ongoing legal battles between Apple and Samsung.
  18. It’s really nice to be able to start integrating structured data with the unstructured visualization. In this case, I have demographic information based on the names associated with the tweets. You could do location, or any of a number of other things. This slide becomes too complex, and so we need a different visualization to show the demographic interaction. We already know when and how much (and sentiment) for each of the themes. We just need to know something about the demographics.
  19. And so we can do this sort of visualization, where the gender is represented by the female/male symbol. You can see that men were more positive on the announcement than women were, and you can see that it was men who were most negative about the Gangnam Style tie-in.
  20. The next few slides are going to compare word clouds to treemaps. Here’s a word cloud from content surrounding one of the recent olympics.
  21. Here’s the same content as a treemap. Treemaps are really best for content that is easily divided into “subsets” – I’ll show an example of this in a few slides.
  22. Let’s do the same exercise we did before with the two word clouds early on. Which word is missing? The difference in packing algorithms between a treemap and a word cloud means that these differences are really much easier to pick out. The ordering of the sizes helps tremendously.
  23. Here’s a treemap of the Usenet hierarchy. (Remember Usenet? I do. If you don’t remember it, it was basically a decentralized set of newsgroups. It’s still around in one form or another.) Note the hierarchy/subset nature, where you can see different parts of usenet, and see the relative sizes. Treemaps make for nice navigational interfaces for highly complex, but inherently ordered content.
  24. Force directed graphs use physics and repulsion to lay out information in a pleasing, relationship-retaining way. There’s a number of packages that will lay out force directed graphs for you. (Packages at the end of the presentation) It’s important to note the inherent connections – you can see the words that are literally and figuratively attached to other words or concepts. Some graphs have directionality indicated by arrows, others simply indicate connectivity. Graphs like this can get really messy if you have too much content and haven’t pruned the connections down, but careful zooming and filtering can really help. Bottlenose provides a platform for analysis of streaming data, of which text content is a part. (http://www.bottlenose.com)
  25. Clustering is both a text analytics and visual concept. They feed into each other very nicely. There are a lot of different ways to do clustering of text, but they all have to do with similarities. You could cluster based on all content that mentioned a particular entity – which is what you see on the Bottlenose graphic from the previous slide. The content is inherently clustered, but is laid out in a simple layout because it is clustering based on the occurrence of a particular phrase or entity. Other clustering algorithms take all of the content into account This is what Quid is doing. (http://www.quid.com) They are using both force directed graphs (which is layout technology) and clustering to add information to the topology of the graph itself. Different shapes of clusters mean different things – if its nice and round and compact, then that means all the articles are saying roughly the same thing. Spread out clusters contain highly differentiated stories. Clusters in the center of the graph indicate central topics or bridging ideas. Distance between clusters shows how inter-related the stories are, closer means more inter-relationships.
  26. This slide shows one cluster around bitcoin regulation. You can see that some of the articles take a different slant than other, but that they’re all related to the same core topic. Individual articles further away from each other take a different slant on the core topic.
  27. Dendrograms are another way to show clusters. You can see the politicians on the right, and how they relate naturally to each other via their communications. Different concepts are along the top, again clustered by similarity through the dendrogram. The frequency of each concept as uttered by each politician is shown in the heatmap. This allows for a nice visual grasp of some of the differences in the stances of the politicians. This is also a really common type of visualization for gene expression, showing how different genes relate to each other.
  28. This is a list of some of the easiest and most sophisticated free visualization tools out there. No-code systems allow you to upload something like a CSV and graph from there. The coding systems require you to do some coding, but are far more sophisticated in how they allow you to graph. If you have some free time, spend some time in the D3 gallery – the breadth of visualizations is quite amazing.
  29. There are two other ways to approach these visualizations. One is to get a commercial toolkit. The list on the right is roughly ordered from top to bottom in terms of how “full” a package they are. For example, Tableau and Jreport are all about visualization, where SAS and SPSS are full-blown statistics packages. The other way is to get all your text from an off-the-shelf system that includes all the content, like a Social Marketing System, or Customer Experience Management System.
  30. The TED talk is a really great example of using animations to tell a story. Telling a story is, at the end of the day, what we’re all trying to accomplish. The goal of that story is to help make a point, to drive a change in behavior. If you take anything away from this presentation, it should be that word clouds and pie charts aren’t the best way to tell that story and that there are myriad other ways to accomplish the storytelling.