SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Twitter: The Tweet, the Whole Tweet,
    and Nothing but the Tweet #2



                    charsyam@naver.com
Data Mining
Discover New Knowledge
Discover New Knowledge
from Existing Information
What do #TeaParty and
    #JustinBieber
   have in common
Tools: Pymongo, MongoDB
apt-get install python-dev
pip install pymongo
Get Tweets
from pymongo.connection import Connection
import sys
import tweepy

connection = Connection("localhost")

db = connection.foo

import tweepy
api = tweepy.API()
tweets = api.search('#JustinBieber', rpp=100)
for tweet in tweets:
   db.foo.save(tweet.__getstate__())
Insert TO MongoDB
from pymongo.connection import Connection
import sys
import tweepy

connection = Connection("localhost")

db = connection.foo

import tweepy
api = tweepy.API()
for num in range(1,16):
    tweets = api.search('#JustinBieber', rpp=100, page=num)
    for tweet in tweets:
       db.foo.save(tweet.__getstate__())
Count Frequency in mongo

   MAP
map = function(){
   words = this.text.split(' ');
   for ( i in words ){
      emit({ key: words[i] }, {count: 1});
   }
};
Count Frequency in mongo

   REDUCE
reduce = function (key, values) {
   var count = 0;
   values.forEach(function (v) {count += v.count;});
   return {count:count};
}
Count Frequency in mongo

   EXECUTE
res = db.foo.mapReduce( map, reduce, {out: "mystring"});
Count Frequency in mongo

   RESULT
{ "_id" : { "key" : "#1000ADay" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#1000aday" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#500ADay" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#500aday" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#AutoFollow" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#Bieber" }, "value" : { "count" : 1 } }
Get From MongoDB
from pymongo.connection import Connection
import sys
import tweepy

connection = Connection("localhost")

db = connection.foo

cursor = db.mystring.find()
for d in cursor:
   print d
What Entities Co-Occur
  Most Often with
  #JustinBieber and
 #TeaParty Tweets?
intersection
import sys
from sets import Set

if __name__=='__main__':
   r1 = open( sys.argv[1] )
   r2 = open( sys.argv[2] )

   s1 = Set()
   s2 = Set()
   for line in r1.readlines():
       key = line.split()
       if( len(key) > 0 ):
           s1.add(key[0])

   for line in r2.readlines():
      key = line.split()
      if( len(key) > 0 ):
          s2.add(key[0])

   s3 = s1.intersection(s2)
   print len(s1)
   print len(s2)
   print len(s3)
On Average, Do #JustinBieber
 or #TeaParty Tweets Have
      More Hashtags?
Which Get Retweeted More
 Often: #JustinBieber or
       #TeaParty?
How Much Overlap Exists
Between the Entities of
    #TeaParty and
 #JustinBieber Tweet?
Thank You!

Weitere ähnliche Inhalte

Ähnlich wie Twitter6

Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Artur Rodrigues
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)MongoSF
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTKFrancesco Bruni
 
Go vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFGo vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFTimur Safin
 
Python postgre sql a wonderful wedding
Python postgre sql   a wonderful weddingPython postgre sql   a wonderful wedding
Python postgre sql a wonderful weddingStéphane Wirtel
 
CoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairCoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairMark
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joinedennui2342
 
Mp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMontreal Python
 
Frsa
FrsaFrsa
Frsa_111
 
WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015Fernando Daciuk
 
Advanced Web Technology ass.pdf
Advanced Web Technology ass.pdfAdvanced Web Technology ass.pdf
Advanced Web Technology ass.pdfsimenehanmut
 
Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Christian Aichinger
 
Using the following code Install Packages pip install .pdf
Using the following code Install Packages   pip install .pdfUsing the following code Install Packages   pip install .pdf
Using the following code Install Packages pip install .pdfpicscamshoppe
 

Ähnlich wie Twitter6 (20)

Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook

 
python codes
python codespython codes
python codes
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
 
Procesamiento del lenguaje natural con python
Procesamiento del lenguaje natural con pythonProcesamiento del lenguaje natural con python
Procesamiento del lenguaje natural con python
 
ECE-PYTHON.docx
ECE-PYTHON.docxECE-PYTHON.docx
ECE-PYTHON.docx
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTK
 
Go vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFGo vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoF
 
Python postgre sql a wonderful wedding
Python postgre sql   a wonderful weddingPython postgre sql   a wonderful wedding
Python postgre sql a wonderful wedding
 
CoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairCoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love Affair
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joined
 
Mp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook game
 
Frsa
FrsaFrsa
Frsa
 
WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015
 
PyLecture2 -NetworkX-
PyLecture2 -NetworkX-PyLecture2 -NetworkX-
PyLecture2 -NetworkX-
 
Python slide
Python slidePython slide
Python slide
 
Progr2
Progr2Progr2
Progr2
 
Advanced Web Technology ass.pdf
Advanced Web Technology ass.pdfAdvanced Web Technology ass.pdf
Advanced Web Technology ass.pdf
 
Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017
 
Using the following code Install Packages pip install .pdf
Using the following code Install Packages   pip install .pdfUsing the following code Install Packages   pip install .pdf
Using the following code Install Packages pip install .pdf
 
Python basic
Python basicPython basic
Python basic
 

Mehr von DaeMyung Kang

How to use redis well
How to use redis wellHow to use redis well
How to use redis wellDaeMyung Kang
 
The easiest consistent hashing
The easiest consistent hashingThe easiest consistent hashing
The easiest consistent hashingDaeMyung Kang
 
How to name a cache key
How to name a cache keyHow to name a cache key
How to name a cache keyDaeMyung Kang
 
Integration between Filebeat and logstash
Integration between Filebeat and logstash Integration between Filebeat and logstash
Integration between Filebeat and logstash DaeMyung Kang
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advanceDaeMyung Kang
 
Massive service basic
Massive service basicMassive service basic
Massive service basicDaeMyung Kang
 
Data Engineering 101
Data Engineering 101Data Engineering 101
Data Engineering 101DaeMyung Kang
 
How To Become Better Engineer
How To Become Better EngineerHow To Become Better Engineer
How To Become Better EngineerDaeMyung Kang
 
Kafka timestamp offset_final
Kafka timestamp offset_finalKafka timestamp offset_final
Kafka timestamp offset_finalDaeMyung Kang
 
Kafka timestamp offset
Kafka timestamp offsetKafka timestamp offset
Kafka timestamp offsetDaeMyung Kang
 
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lakeDaeMyung Kang
 
webservice scaling for newbie
webservice scaling for newbiewebservice scaling for newbie
webservice scaling for newbieDaeMyung Kang
 

Mehr von DaeMyung Kang (20)

Count min sketch
Count min sketchCount min sketch
Count min sketch
 
Redis
RedisRedis
Redis
 
Ansible
AnsibleAnsible
Ansible
 
Why GUID is needed
Why GUID is neededWhy GUID is needed
Why GUID is needed
 
How to use redis well
How to use redis wellHow to use redis well
How to use redis well
 
The easiest consistent hashing
The easiest consistent hashingThe easiest consistent hashing
The easiest consistent hashing
 
How to name a cache key
How to name a cache keyHow to name a cache key
How to name a cache key
 
Integration between Filebeat and logstash
Integration between Filebeat and logstash Integration between Filebeat and logstash
Integration between Filebeat and logstash
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advance
 
Massive service basic
Massive service basicMassive service basic
Massive service basic
 
Data Engineering 101
Data Engineering 101Data Engineering 101
Data Engineering 101
 
How To Become Better Engineer
How To Become Better EngineerHow To Become Better Engineer
How To Become Better Engineer
 
Kafka timestamp offset_final
Kafka timestamp offset_finalKafka timestamp offset_final
Kafka timestamp offset_final
 
Kafka timestamp offset
Kafka timestamp offsetKafka timestamp offset
Kafka timestamp offset
 
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lake
 
Redis acl
Redis aclRedis acl
Redis acl
 
Coffee store
Coffee storeCoffee store
Coffee store
 
Scalable webservice
Scalable webserviceScalable webservice
Scalable webservice
 
Number system
Number systemNumber system
Number system
 
webservice scaling for newbie
webservice scaling for newbiewebservice scaling for newbie
webservice scaling for newbie
 

Kürzlich hochgeladen

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Kürzlich hochgeladen (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Twitter6

  • 1. Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet #2 charsyam@naver.com
  • 4. Discover New Knowledge from Existing Information
  • 5. What do #TeaParty and #JustinBieber have in common
  • 6. Tools: Pymongo, MongoDB apt-get install python-dev pip install pymongo
  • 7. Get Tweets from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo import tweepy api = tweepy.API() tweets = api.search('#JustinBieber', rpp=100) for tweet in tweets: db.foo.save(tweet.__getstate__())
  • 8. Insert TO MongoDB from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo import tweepy api = tweepy.API() for num in range(1,16): tweets = api.search('#JustinBieber', rpp=100, page=num) for tweet in tweets: db.foo.save(tweet.__getstate__())
  • 9. Count Frequency in mongo MAP map = function(){ words = this.text.split(' '); for ( i in words ){ emit({ key: words[i] }, {count: 1}); } };
  • 10. Count Frequency in mongo REDUCE reduce = function (key, values) { var count = 0; values.forEach(function (v) {count += v.count;}); return {count:count}; }
  • 11. Count Frequency in mongo EXECUTE res = db.foo.mapReduce( map, reduce, {out: "mystring"});
  • 12. Count Frequency in mongo RESULT { "_id" : { "key" : "#1000ADay" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#1000aday" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#500ADay" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#500aday" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#AutoFollow" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#Bieber" }, "value" : { "count" : 1 } }
  • 13. Get From MongoDB from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo cursor = db.mystring.find() for d in cursor: print d
  • 14. What Entities Co-Occur Most Often with #JustinBieber and #TeaParty Tweets?
  • 15. intersection import sys from sets import Set if __name__=='__main__': r1 = open( sys.argv[1] ) r2 = open( sys.argv[2] ) s1 = Set() s2 = Set() for line in r1.readlines(): key = line.split() if( len(key) > 0 ): s1.add(key[0]) for line in r2.readlines(): key = line.split() if( len(key) > 0 ): s2.add(key[0]) s3 = s1.intersection(s2) print len(s1) print len(s2) print len(s3)
  • 16. On Average, Do #JustinBieber or #TeaParty Tweets Have More Hashtags?
  • 17. Which Get Retweeted More Often: #JustinBieber or #TeaParty?
  • 18. How Much Overlap Exists Between the Entities of #TeaParty and #JustinBieber Tweet?