SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
Mining Social Web Data Like a
Pro: Four Steps to Success
Presented by Matthew A. Russell
"Data Journalism and Interactivity" - GDA Seminar
Quito, Ecuador - 20 September 2013
1
Hola
2
Trained as a Computer Scientist
CTO @ Digital Reasoning Systems
Data Mining, Machine Learning
Principal @ Zaffra
Boutique Consulting
Author @ O'Reilly Media
5 published books on technology
3
Transform Curiosity Into Insight
4
An open source project
http://bit.ly/MiningTheSocialWeb2E
Inherently accessible
Virtual machine & IPython Notebook UX
Turn-key code templates for
bootstrapping data science experiments
Think of the book as "premium" support
for the OSS project
¿Por qué no Español?
5
Investigative Journalist
6
"A person whose
profession it is to
discover the truth and
to identify lapses from
it in whatever media
may be available."
Data Science
7
Data => Actionable Information
Highly interdisciplinary
Nascent
Necessary
http://wikipedia.org/wiki/Data_science
Digital Signal Explosion
A model for the world: signal and sinks
Growth in data exhaust is accelerating
Digital fingerprints
Software is eating the world
Data mining opportunities galore...
8
Digital Data Stats
100 terabytes of data uploaded daily to Facebook.
Brands and organizations on Facebook receive 34,722 Likes every minute of
the day.
According to Twitter’s own research in early 2012, it sees roughly 175 million
tweets every day
30 Billion pieces of content shared on Facebook every month.
Data production will be 44 times greater in 2020 than it was in 2009
According to estimates, the volume of business data worldwide, across all
companies, doubles every 1.2 years.
9
See http://wikibon.org/blog/big-data-statistics
Social Media Is All the Rage
World population: ~7B people
Facebook: 1.15B users
Twitter: 500M users
Google+ 343M users
LinkedIn: 238M users
~200M+ blogs (conservative estimate)
10
But Why Is It All the Rage?
It satisfies fundamental human desires
We want to be heard
We want to satisfy our curiosity
We want it easy
We want it now
11
12
Roberto Mercedes
Jorge
Ana
Nina
Social Network Mechanics
Interest Graph Mechanics
13
Roberto Mercedes
Jorge
Ana
Nina
U2
Juan
Luis
Guerra
Juan
Luís
Guerra
A (Social) Interest Graph
14
Roberto Mercedes
Jorge
Ana
Nina
U2
Juan
Luis
Guerra
Juan
Luís
Guerra
A (Political) Interest Graph
15
Roberto Mercedes
Jorge
Ana
Nina
Johnny
Araya
Rodolfo
Hernández
Social Media Dimensions
16
Facebook
Accounts Types: People & Pages
Mutual Connections
"Likes"
"Shares"
"Comments"
Extensive Privacy Controls
Twitter
Accounts Types: "Anything"
"Following" Relationships
Favorites
Retweets
Replies
(Almost) No Privacy Controls
Why Does This Matter?
"If you can measure it, you can improve it"
Modeling Behavior
Predictive Analysis
Recommending Content
Swaying political situations might just be the ultimate value proposition for
social media
17
Social Media Analysis Framework
Four Steps To Success
Aspire
Acquire
Analyze
Summarize
Let's step through a trivial example...
18
(1) Aspire
Let's frame a trivial hypothesis to illustrate the four steps...
Frame a hypothesis about some real world phenomenon
For example: "Johnny Araya is a more popular candidate than Rodolfo
Hernández"
Let's use social media as a basis of investigation
19
(2) Acquire
Collect the data that you need to test the hypothesis
How?
Use Facebook and Twitter APIs to harvest data about each candidate
Go after low hanging fruit before something more complex
You don't even need to write code to do this (yet)
20
They're both on Facebook
21
http://facebook.com/ElDoctor2014
http://facebook.com/JohnnyArayaMonge
They're both on Twitter
22
@Johnny_Araya@ElDoctor2014
(3) Analyze
Count, Filter, and Rank the Data
Johnny Araya:
~50k Facebook likes
~14k Twitter followers
Rodolfo Hernández:
~37k Facebook likes;
745 Twitter followers
Johnny Araya is indeed more popular in social media
23
(4) Summarize
Present the data in a concise and easily understood manner
Charts
Tables
Simple visualizations
Some examples...
24
25
Araya%
Hernandez%
Araya%
Hernandez%
Twitter Popularity
Social Media Popularity: Araya vs Hernández
Facebook Popularity
26
0"
10000"
20000"
30000"
40000"
50000"
60000"
Araya" Hernandez"
Twi5er"followers"
Facebook"fans"
Social Media Popularity: Araya vs Hernández
27
1"
10"
100"
1000"
10000"
100000"
Araya" Hernandez"
Twi0er"followers"
Facebook"fans"
Social Media Popularity: Araya vs Hernández
Twitter Popularity
28
Facebook Popularity
29
JohnnyArayaMonge,
35%,
o0oguevaraguth,
17%,
luisguillermosolisr,
3%,
villaltaJM,
19%,
ElDoctor2014,
26%,
Facebook(Likes(for(Costa(Rican(Presiden4al(Candidates(
Recall the previous hypothesis:
"Johnny Araya is a more popular candidate than Rodolfo Hernández"
What do we know now that we didn't before?
The current state of each candidate's Twitter and Facebook popularity
Let's explore a slightly more complex hypothesis...
30
Reflect and Refine...
(1) Aspire
Redefine the hypothesis:
For example: "Johnny Araya has a more effective social media strategy than
Rodolfo Hernández"
Presumably because of his superior social media status at the moment
31
(2) Acquire
Collect the data that you need to test the hypothesis
How? Use APIs to harvest data about each candidate
Let's consider any Facebook posts for 2013
32
33
for candidate in ['JohnnyArayaMonge', 'ElDoctor2014']:
# Get the data
url = 'https://graph.facebook.com/{0}?' + 
fields= posts.limit(500)&access_token=XXX'.format(candidate)
content = requests.get(url).json()
# Save the data
f = open(candidate + ".json", "w")
f.write(json.dumps(content))
f.close()
Python Source Code
(3) Analyze
34
Count, Filter, and Rank the Data
Some more Python source code to crunch the numbers
Extract Facebook likes and shares this year
Facebook Vitals
35
ElDoctor2014
Total Likes 37495
Num Posts since Jan 1, 2013 (of 500 possible) 436
Total Post Likes 155473
Total Post Shares 9684
Oldest Post in Batch 2013-03-15T00:40:21+0000
Num posts prior to Jan 1, 2013 0
Avg likes/post 356.589449541 (0.951032003044%)
Avg shares/post 22.2110091743 (0.059237256099%)
Post Types [(u'photo', 286), (u'link', 77), (u'status', 40), (u'video', 32), (u'swf', 1)]
JohnnyArayaMonge	
Total Likes 50301	
Num Posts since Jan 1, 2013 (of 500 possible) 205	
Total Post Likes 176161	
Total Post Shares 7542	
Oldest Post in Batch 2013-01-01T07:18:43+0000	
Num posts prior to Jan 1, 2013 190	
Avg likes/post 859.32195122 (1.70835957778%)	
Avg shares/post 36.7902439024 (0.0731401838978%)	
Post Types [(u'photo', 149), (u'status', 38), (u'link', 13), (u'video', 5)]
(4) Summarize
Present the data in a concise and easily understood manner
Like a table...
36
37
Metric Araya Hernández
Total Likes
Posts since 1 Jan 13
Num Prior Posts
Earliest Post
Post Likes since 1 Jan 13
Post Shares since 1 Jan 13
Avg Likes per Post
Avg Shares per Post
50,301 37,495
205 436
190+ 0
1 Jan 2013 15 March 2013
176,161 155,473
7,542 9,684
859 356
36 22
38
Metric Araya Hernández
Total Likes
Posts since 1 Jan 13
Num Prior Posts
Earliest Post
Post Likes since 1 Jan 13
Post Shares since 1 Jan 13
Avg Likes per Post
Avg Shares per Post
50,301 37,495
205 436
190+ 0
1 Jan 2013 15 March 2013
176,161 155,473
7,542 9,684
859 356
36 22
Recall the hypothesis:
"Johnny Araya has a more effective social media strategy than Rodolfo
Hernández because he has more Facebook and Twitter popularity"
What do we know now?
Hernández has Facebook vitals that are quite competitive with Araya
However, Hernández only joined Facebook ~6 months ago!
It would appear that Hernández has the more effective strategy
What is he doing to rise in popularity so quickly?
39
Reflect and Refine...
40
Comparison of Facebok Content
Other Candidates
41
Johnny Araya FB Posts
42
Rodolfo Hernández FB Posts
43
44
Past ~2 Months on Facebook
45
Aug 2013 FB Likes Sept 2013 FB Likes % Change
Johnny Araya
Otto Guevara
Guth
José María
Villalta Florez-
Estrada
Dr. Rodolfo
Hernández
Luis Guillermo
Solís Rivera
50,301 53,809 6.97%
24,146 27,675 14.62%
27,262 35,169 29.00%
37,495 38,298 2.14%
5,334 6,763 26.79%
Past ~3 Months on Twitter
46
Aug 2013 Sept 2013 % Change
Johnny Araya
Otto Guevara Guth
José María Villalta
Florez-Estrada
Dr. Rodolfo
Hernández
Luis Guillermo Solís
Rivera
14,573 15,506 6.40%
114 159 39.47%
8,160 8,990 10.17%
745 858 15.17%
1,192 1,487 24.75%
Facebook and Twitter Compared
47
% FB Change % Twitter Change
Johnny Araya
Otto Guevara
Guth
José María
Villalta Florez-
Estrada
Dr. Rodolfo
Hernández
Luis Guillermo
Solís Rivera
6.97% 6.40%
14.62% 39.47%
29.00% 10.17%
2.14% 15.17%
26.79% 24.75%
Your Imagination Is the Only Limit
Analyze the comments that people are leaving on Facebook pages
Try to ascertain common common Facebook fans or Twitter followers
amongst candidates
Deduce demographics from social media by synthesizing public data
Theorize about potential "reach" or "influence" using social media
Analyze data in realtime
48
Thinking about Reach
49
Think about "liking" and "following" as opt-ins to feeds
Remember: Interest Graphs
Arriving at effective metrics is tricker than it initially seems
Potential Twitter Influence
50
Araya Hernández
Followers
Theoretical
Reach
Reach (10)
Reach (100)
Reach (1000)
Reach (10,000)
"Suspect"
Followers
~14k ~750
~40M ~550k
490 673
289 702
2782 X
2832 X
3,246 94
See also http://wp.me/p3QiJd-2a
Potential Influence
51
Who are Candidates Following?
52
What are Candidates Tweeting?
53
Realtime Analysis
54
Monitor Twitter's firehose for realtime data using filters such as #Syria
Keep in mind the sheer volume of data can be considerable
Analysis at MiningTheSocialWeb.com
Mapping #Syria Tweets
55
See http://wp.me/p3QiJd-1t Text
Temporal Analysis on #Syria
56
Analyzing #Syria Tweet Entities
57
Closing Remarks
Software is the gift that keeps on giving
Code it up once, run it ad infinitum...
Code designed for one account will work for other accounts
Analysis is all about knowing what to count
Coding it up is just the dirty work
Start somewhere and then iteratively explore...then exploit
58
Aspire to Do Great Things
Predicting demographic data such as age or gender is possible for some
languages
Time and space are fundamentals for grounding online discussions in
reality.
Twitter is about as good as it gets for realtime topical analysis
Think of the world as signal producers and signal collectors
Monitoring breaking news events like #Syria
59
The Tip of the Iceberg
60
Stay in Touch
Website: http://MiningTheSocialWeb.com
Twitter: @ptwobrussell
FB: http://facebook.com/MiningTheSocialWeb
LinkedIn: http://linkedin.com/in/ptwobrussell
Email: ptwobrussell@gmail.com
61

Weitere ähnliche Inhalte

Ähnlich wie Mining Social Media Data for Insights on Costa Rican Presidential Candidates

Jay Lauf's newsrewired keynote
Jay Lauf's newsrewired keynoteJay Lauf's newsrewired keynote
Jay Lauf's newsrewired keynoteJohn Thompson
 
Data and Journalism
Data and JournalismData and Journalism
Data and JournalismLutz Finger
 
How to be an Internet Anthropologist | Social Media Week 2014
How to be an Internet Anthropologist | Social Media Week 2014How to be an Internet Anthropologist | Social Media Week 2014
How to be an Internet Anthropologist | Social Media Week 2014hughadam
 
Getting Fresh…Socially. A Social Fresh EAST Recap.
Getting Fresh…Socially. A Social Fresh EAST Recap.Getting Fresh…Socially. A Social Fresh EAST Recap.
Getting Fresh…Socially. A Social Fresh EAST Recap.ClearEdge Marketing
 
Social Media - An Introduction
Social Media - An IntroductionSocial Media - An Introduction
Social Media - An IntroductionClaus Enevoldsen
 
Go Global: 7 steps to integrating digital & social for a live event
Go Global: 7 steps to integrating digital & social for a live eventGo Global: 7 steps to integrating digital & social for a live event
Go Global: 7 steps to integrating digital & social for a live eventHeather Read
 
World IA Day Edmonton 2019 Presentation
World IA Day Edmonton 2019 PresentationWorld IA Day Edmonton 2019 Presentation
World IA Day Edmonton 2019 PresentationJason Buzzell
 
If you build it, don't expect them to come.... How lawyers and legal professi...
If you build it, don't expect them to come.... How lawyers and legal professi...If you build it, don't expect them to come.... How lawyers and legal professi...
If you build it, don't expect them to come.... How lawyers and legal professi...Lance Godard
 
Alice Andreuzzi, Catchy Srl - "Catchy: Social Data Intelligence"
Alice Andreuzzi, Catchy Srl - "Catchy: Social Data Intelligence"Alice Andreuzzi, Catchy Srl - "Catchy: Social Data Intelligence"
Alice Andreuzzi, Catchy Srl - "Catchy: Social Data Intelligence"Data Driven Innovation
 
Exam activity 24.2.14
Exam activity 24.2.14Exam activity 24.2.14
Exam activity 24.2.14JCA
 
Social Justice & Black Twitter
Social Justice & Black TwitterSocial Justice & Black Twitter
Social Justice & Black TwitterAyodele Odubela
 
eduWeb19 Presentation
eduWeb19 PresentationeduWeb19 Presentation
eduWeb19 PresentationJason Buzzell
 
Social networking and the translator
Social networking and the translatorSocial networking and the translator
Social networking and the translatorQabiria
 
Social Media and Fundraising at Pgh Foundation Center by Dave Tinker, CFRE
Social Media and Fundraising at Pgh Foundation Center by Dave Tinker, CFRESocial Media and Fundraising at Pgh Foundation Center by Dave Tinker, CFRE
Social Media and Fundraising at Pgh Foundation Center by Dave Tinker, CFREDave Tinker, CFRE
 

Ähnlich wie Mining Social Media Data for Insights on Costa Rican Presidential Candidates (20)

Social Media Analytics
Social Media Analytics Social Media Analytics
Social Media Analytics
 
SWMRA EF- 2011
SWMRA EF- 2011SWMRA EF- 2011
SWMRA EF- 2011
 
Jay Lauf's newsrewired keynote
Jay Lauf's newsrewired keynoteJay Lauf's newsrewired keynote
Jay Lauf's newsrewired keynote
 
Data and Journalism
Data and JournalismData and Journalism
Data and Journalism
 
How to be an Internet Anthropologist | Social Media Week 2014
How to be an Internet Anthropologist | Social Media Week 2014How to be an Internet Anthropologist | Social Media Week 2014
How to be an Internet Anthropologist | Social Media Week 2014
 
Getting Fresh…Socially. A Social Fresh EAST Recap.
Getting Fresh…Socially. A Social Fresh EAST Recap.Getting Fresh…Socially. A Social Fresh EAST Recap.
Getting Fresh…Socially. A Social Fresh EAST Recap.
 
Social Media - An Introduction
Social Media - An IntroductionSocial Media - An Introduction
Social Media - An Introduction
 
Go Global: 7 steps to integrating digital & social for a live event
Go Global: 7 steps to integrating digital & social for a live eventGo Global: 7 steps to integrating digital & social for a live event
Go Global: 7 steps to integrating digital & social for a live event
 
World IA Day Edmonton 2019 Presentation
World IA Day Edmonton 2019 PresentationWorld IA Day Edmonton 2019 Presentation
World IA Day Edmonton 2019 Presentation
 
If you build it, don't expect them to come.... How lawyers and legal professi...
If you build it, don't expect them to come.... How lawyers and legal professi...If you build it, don't expect them to come.... How lawyers and legal professi...
If you build it, don't expect them to come.... How lawyers and legal professi...
 
Alice Andreuzzi, Catchy Srl - "Catchy: Social Data Intelligence"
Alice Andreuzzi, Catchy Srl - "Catchy: Social Data Intelligence"Alice Andreuzzi, Catchy Srl - "Catchy: Social Data Intelligence"
Alice Andreuzzi, Catchy Srl - "Catchy: Social Data Intelligence"
 
Exam activity 24.2.14
Exam activity 24.2.14Exam activity 24.2.14
Exam activity 24.2.14
 
Nfp workshop
Nfp workshop Nfp workshop
Nfp workshop
 
Social Justice & Black Twitter
Social Justice & Black TwitterSocial Justice & Black Twitter
Social Justice & Black Twitter
 
eduWeb19 Presentation
eduWeb19 PresentationeduWeb19 Presentation
eduWeb19 Presentation
 
The hyper social organization
The hyper social organizationThe hyper social organization
The hyper social organization
 
HR Tech Conference: #hrtechconf Twitterversity
HR Tech Conference: #hrtechconf TwitterversityHR Tech Conference: #hrtechconf Twitterversity
HR Tech Conference: #hrtechconf Twitterversity
 
Social networking and the translator
Social networking and the translatorSocial networking and the translator
Social networking and the translator
 
Cuba chamber 10 18-10
Cuba chamber 10 18-10Cuba chamber 10 18-10
Cuba chamber 10 18-10
 
Social Media and Fundraising at Pgh Foundation Center by Dave Tinker, CFRE
Social Media and Fundraising at Pgh Foundation Center by Dave Tinker, CFRESocial Media and Fundraising at Pgh Foundation Center by Dave Tinker, CFRE
Social Media and Fundraising at Pgh Foundation Center by Dave Tinker, CFRE
 

Mehr von Matthew Russell

Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Matthew Russell
 
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMatthew Russell
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebMatthew Russell
 
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMatthew Russell
 
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Matthew Russell
 
Mining Social Web APIs with IPython Notebook (Strata 2013)
Mining Social Web APIs with IPython Notebook (Strata 2013)Mining Social Web APIs with IPython Notebook (Strata 2013)
Mining Social Web APIs with IPython Notebook (Strata 2013)Matthew Russell
 
Mining the Geo Needles in the Social Haystack
Mining the Geo Needles in the Social HaystackMining the Geo Needles in the Social Haystack
Mining the Geo Needles in the Social HaystackMatthew Russell
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightMatthew Russell
 

Mehr von Matthew Russell (9)

Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
 
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started Guide
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social Web
 
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started Guide
 
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
 
Mining Social Web APIs with IPython Notebook (Strata 2013)
Mining Social Web APIs with IPython Notebook (Strata 2013)Mining Social Web APIs with IPython Notebook (Strata 2013)
Mining Social Web APIs with IPython Notebook (Strata 2013)
 
Mining the Geo Needles in the Social Haystack
Mining the Geo Needles in the Social HaystackMining the Geo Needles in the Social Haystack
Mining the Geo Needles in the Social Haystack
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and Insight
 

Kürzlich hochgeladen

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Kürzlich hochgeladen (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Mining Social Media Data for Insights on Costa Rican Presidential Candidates

  • 1. Mining Social Web Data Like a Pro: Four Steps to Success Presented by Matthew A. Russell "Data Journalism and Interactivity" - GDA Seminar Quito, Ecuador - 20 September 2013 1
  • 2. Hola 2 Trained as a Computer Scientist CTO @ Digital Reasoning Systems Data Mining, Machine Learning Principal @ Zaffra Boutique Consulting Author @ O'Reilly Media 5 published books on technology
  • 3. 3
  • 4. Transform Curiosity Into Insight 4 An open source project http://bit.ly/MiningTheSocialWeb2E Inherently accessible Virtual machine & IPython Notebook UX Turn-key code templates for bootstrapping data science experiments Think of the book as "premium" support for the OSS project
  • 5. ¿Por qué no Español? 5
  • 6. Investigative Journalist 6 "A person whose profession it is to discover the truth and to identify lapses from it in whatever media may be available."
  • 7. Data Science 7 Data => Actionable Information Highly interdisciplinary Nascent Necessary http://wikipedia.org/wiki/Data_science
  • 8. Digital Signal Explosion A model for the world: signal and sinks Growth in data exhaust is accelerating Digital fingerprints Software is eating the world Data mining opportunities galore... 8
  • 9. Digital Data Stats 100 terabytes of data uploaded daily to Facebook. Brands and organizations on Facebook receive 34,722 Likes every minute of the day. According to Twitter’s own research in early 2012, it sees roughly 175 million tweets every day 30 Billion pieces of content shared on Facebook every month. Data production will be 44 times greater in 2020 than it was in 2009 According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years. 9 See http://wikibon.org/blog/big-data-statistics
  • 10. Social Media Is All the Rage World population: ~7B people Facebook: 1.15B users Twitter: 500M users Google+ 343M users LinkedIn: 238M users ~200M+ blogs (conservative estimate) 10
  • 11. But Why Is It All the Rage? It satisfies fundamental human desires We want to be heard We want to satisfy our curiosity We want it easy We want it now 11
  • 13. Interest Graph Mechanics 13 Roberto Mercedes Jorge Ana Nina U2 Juan Luis Guerra Juan Luís Guerra
  • 14. A (Social) Interest Graph 14 Roberto Mercedes Jorge Ana Nina U2 Juan Luis Guerra Juan Luís Guerra
  • 15. A (Political) Interest Graph 15 Roberto Mercedes Jorge Ana Nina Johnny Araya Rodolfo Hernández
  • 16. Social Media Dimensions 16 Facebook Accounts Types: People & Pages Mutual Connections "Likes" "Shares" "Comments" Extensive Privacy Controls Twitter Accounts Types: "Anything" "Following" Relationships Favorites Retweets Replies (Almost) No Privacy Controls
  • 17. Why Does This Matter? "If you can measure it, you can improve it" Modeling Behavior Predictive Analysis Recommending Content Swaying political situations might just be the ultimate value proposition for social media 17
  • 18. Social Media Analysis Framework Four Steps To Success Aspire Acquire Analyze Summarize Let's step through a trivial example... 18
  • 19. (1) Aspire Let's frame a trivial hypothesis to illustrate the four steps... Frame a hypothesis about some real world phenomenon For example: "Johnny Araya is a more popular candidate than Rodolfo Hernández" Let's use social media as a basis of investigation 19
  • 20. (2) Acquire Collect the data that you need to test the hypothesis How? Use Facebook and Twitter APIs to harvest data about each candidate Go after low hanging fruit before something more complex You don't even need to write code to do this (yet) 20
  • 21. They're both on Facebook 21 http://facebook.com/ElDoctor2014 http://facebook.com/JohnnyArayaMonge
  • 22. They're both on Twitter 22 @Johnny_Araya@ElDoctor2014
  • 23. (3) Analyze Count, Filter, and Rank the Data Johnny Araya: ~50k Facebook likes ~14k Twitter followers Rodolfo Hernández: ~37k Facebook likes; 745 Twitter followers Johnny Araya is indeed more popular in social media 23
  • 24. (4) Summarize Present the data in a concise and easily understood manner Charts Tables Simple visualizations Some examples... 24
  • 25. 25 Araya% Hernandez% Araya% Hernandez% Twitter Popularity Social Media Popularity: Araya vs Hernández Facebook Popularity
  • 30. Recall the previous hypothesis: "Johnny Araya is a more popular candidate than Rodolfo Hernández" What do we know now that we didn't before? The current state of each candidate's Twitter and Facebook popularity Let's explore a slightly more complex hypothesis... 30 Reflect and Refine...
  • 31. (1) Aspire Redefine the hypothesis: For example: "Johnny Araya has a more effective social media strategy than Rodolfo Hernández" Presumably because of his superior social media status at the moment 31
  • 32. (2) Acquire Collect the data that you need to test the hypothesis How? Use APIs to harvest data about each candidate Let's consider any Facebook posts for 2013 32
  • 33. 33 for candidate in ['JohnnyArayaMonge', 'ElDoctor2014']: # Get the data url = 'https://graph.facebook.com/{0}?' + fields= posts.limit(500)&access_token=XXX'.format(candidate) content = requests.get(url).json() # Save the data f = open(candidate + ".json", "w") f.write(json.dumps(content)) f.close() Python Source Code
  • 34. (3) Analyze 34 Count, Filter, and Rank the Data Some more Python source code to crunch the numbers Extract Facebook likes and shares this year
  • 35. Facebook Vitals 35 ElDoctor2014 Total Likes 37495 Num Posts since Jan 1, 2013 (of 500 possible) 436 Total Post Likes 155473 Total Post Shares 9684 Oldest Post in Batch 2013-03-15T00:40:21+0000 Num posts prior to Jan 1, 2013 0 Avg likes/post 356.589449541 (0.951032003044%) Avg shares/post 22.2110091743 (0.059237256099%) Post Types [(u'photo', 286), (u'link', 77), (u'status', 40), (u'video', 32), (u'swf', 1)] JohnnyArayaMonge Total Likes 50301 Num Posts since Jan 1, 2013 (of 500 possible) 205 Total Post Likes 176161 Total Post Shares 7542 Oldest Post in Batch 2013-01-01T07:18:43+0000 Num posts prior to Jan 1, 2013 190 Avg likes/post 859.32195122 (1.70835957778%) Avg shares/post 36.7902439024 (0.0731401838978%) Post Types [(u'photo', 149), (u'status', 38), (u'link', 13), (u'video', 5)]
  • 36. (4) Summarize Present the data in a concise and easily understood manner Like a table... 36
  • 37. 37 Metric Araya Hernández Total Likes Posts since 1 Jan 13 Num Prior Posts Earliest Post Post Likes since 1 Jan 13 Post Shares since 1 Jan 13 Avg Likes per Post Avg Shares per Post 50,301 37,495 205 436 190+ 0 1 Jan 2013 15 March 2013 176,161 155,473 7,542 9,684 859 356 36 22
  • 38. 38 Metric Araya Hernández Total Likes Posts since 1 Jan 13 Num Prior Posts Earliest Post Post Likes since 1 Jan 13 Post Shares since 1 Jan 13 Avg Likes per Post Avg Shares per Post 50,301 37,495 205 436 190+ 0 1 Jan 2013 15 March 2013 176,161 155,473 7,542 9,684 859 356 36 22
  • 39. Recall the hypothesis: "Johnny Araya has a more effective social media strategy than Rodolfo Hernández because he has more Facebook and Twitter popularity" What do we know now? Hernández has Facebook vitals that are quite competitive with Araya However, Hernández only joined Facebook ~6 months ago! It would appear that Hernández has the more effective strategy What is he doing to rise in popularity so quickly? 39 Reflect and Refine...
  • 42. Johnny Araya FB Posts 42
  • 44. 44
  • 45. Past ~2 Months on Facebook 45 Aug 2013 FB Likes Sept 2013 FB Likes % Change Johnny Araya Otto Guevara Guth José María Villalta Florez- Estrada Dr. Rodolfo Hernández Luis Guillermo Solís Rivera 50,301 53,809 6.97% 24,146 27,675 14.62% 27,262 35,169 29.00% 37,495 38,298 2.14% 5,334 6,763 26.79%
  • 46. Past ~3 Months on Twitter 46 Aug 2013 Sept 2013 % Change Johnny Araya Otto Guevara Guth José María Villalta Florez-Estrada Dr. Rodolfo Hernández Luis Guillermo Solís Rivera 14,573 15,506 6.40% 114 159 39.47% 8,160 8,990 10.17% 745 858 15.17% 1,192 1,487 24.75%
  • 47. Facebook and Twitter Compared 47 % FB Change % Twitter Change Johnny Araya Otto Guevara Guth José María Villalta Florez- Estrada Dr. Rodolfo Hernández Luis Guillermo Solís Rivera 6.97% 6.40% 14.62% 39.47% 29.00% 10.17% 2.14% 15.17% 26.79% 24.75%
  • 48. Your Imagination Is the Only Limit Analyze the comments that people are leaving on Facebook pages Try to ascertain common common Facebook fans or Twitter followers amongst candidates Deduce demographics from social media by synthesizing public data Theorize about potential "reach" or "influence" using social media Analyze data in realtime 48
  • 49. Thinking about Reach 49 Think about "liking" and "following" as opt-ins to feeds Remember: Interest Graphs Arriving at effective metrics is tricker than it initially seems
  • 50. Potential Twitter Influence 50 Araya Hernández Followers Theoretical Reach Reach (10) Reach (100) Reach (1000) Reach (10,000) "Suspect" Followers ~14k ~750 ~40M ~550k 490 673 289 702 2782 X 2832 X 3,246 94 See also http://wp.me/p3QiJd-2a
  • 52. Who are Candidates Following? 52
  • 53. What are Candidates Tweeting? 53
  • 54. Realtime Analysis 54 Monitor Twitter's firehose for realtime data using filters such as #Syria Keep in mind the sheer volume of data can be considerable Analysis at MiningTheSocialWeb.com
  • 55. Mapping #Syria Tweets 55 See http://wp.me/p3QiJd-1t Text
  • 56. Temporal Analysis on #Syria 56
  • 57. Analyzing #Syria Tweet Entities 57
  • 58. Closing Remarks Software is the gift that keeps on giving Code it up once, run it ad infinitum... Code designed for one account will work for other accounts Analysis is all about knowing what to count Coding it up is just the dirty work Start somewhere and then iteratively explore...then exploit 58
  • 59. Aspire to Do Great Things Predicting demographic data such as age or gender is possible for some languages Time and space are fundamentals for grounding online discussions in reality. Twitter is about as good as it gets for realtime topical analysis Think of the world as signal producers and signal collectors Monitoring breaking news events like #Syria 59
  • 60. The Tip of the Iceberg 60
  • 61. Stay in Touch Website: http://MiningTheSocialWeb.com Twitter: @ptwobrussell FB: http://facebook.com/MiningTheSocialWeb LinkedIn: http://linkedin.com/in/ptwobrussell Email: ptwobrussell@gmail.com 61