SlideShare a Scribd company logo
1 of 19
Download to read offline
Twitter: The Tweet, the Whole Tweet,
    and Nothing but the Tweet #2



                    charsyam@naver.com
Data Mining
Discover New Knowledge
Discover New Knowledge
from Existing Information
What do #TeaParty and
    #JustinBieber
   have in common
Tools: Pymongo, MongoDB
apt-get install python-dev
pip install pymongo
Get Tweets
from pymongo.connection import Connection
import sys
import tweepy

connection = Connection("localhost")

db = connection.foo

import tweepy
api = tweepy.API()
tweets = api.search('#JustinBieber', rpp=100)
for tweet in tweets:
   db.foo.save(tweet.__getstate__())
Insert TO MongoDB
from pymongo.connection import Connection
import sys
import tweepy

connection = Connection("localhost")

db = connection.foo

import tweepy
api = tweepy.API()
for num in range(1,16):
    tweets = api.search('#JustinBieber', rpp=100, page=num)
    for tweet in tweets:
       db.foo.save(tweet.__getstate__())
Count Frequency in mongo

   MAP
map = function(){
   words = this.text.split(' ');
   for ( i in words ){
      emit({ key: words[i] }, {count: 1});
   }
};
Count Frequency in mongo

   REDUCE
reduce = function (key, values) {
   var count = 0;
   values.forEach(function (v) {count += v.count;});
   return {count:count};
}
Count Frequency in mongo

   EXECUTE
res = db.foo.mapReduce( map, reduce, {out: "mystring"});
Count Frequency in mongo

   RESULT
{ "_id" : { "key" : "#1000ADay" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#1000aday" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#500ADay" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#500aday" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#AutoFollow" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#Bieber" }, "value" : { "count" : 1 } }
Get From MongoDB
from pymongo.connection import Connection
import sys
import tweepy

connection = Connection("localhost")

db = connection.foo

cursor = db.mystring.find()
for d in cursor:
   print d
What Entities Co-Occur
  Most Often with
  #JustinBieber and
 #TeaParty Tweets?
intersection
import sys
from sets import Set

if __name__=='__main__':
   r1 = open( sys.argv[1] )
   r2 = open( sys.argv[2] )

   s1 = Set()
   s2 = Set()
   for line in r1.readlines():
       key = line.split()
       if( len(key) > 0 ):
           s1.add(key[0])

   for line in r2.readlines():
      key = line.split()
      if( len(key) > 0 ):
          s2.add(key[0])

   s3 = s1.intersection(s2)
   print len(s1)
   print len(s2)
   print len(s3)
On Average, Do #JustinBieber
 or #TeaParty Tweets Have
      More Hashtags?
Which Get Retweeted More
 Often: #JustinBieber or
       #TeaParty?
How Much Overlap Exists
Between the Entities of
    #TeaParty and
 #JustinBieber Tweet?
Thank You!

More Related Content

Viewers also liked

Viewers also liked (20)

Twemproxy flow
Twemproxy flowTwemproxy flow
Twemproxy flow
 
Opensource sw day
Opensource sw dayOpensource sw day
Opensource sw day
 
Redis sentinelinternals deview
Redis sentinelinternals deviewRedis sentinelinternals deview
Redis sentinelinternals deview
 
No sql preview
No sql previewNo sql preview
No sql preview
 
Guid python
Guid pythonGuid python
Guid python
 
Mongo db 잡학상식
Mongo db 잡학상식Mongo db 잡학상식
Mongo db 잡학상식
 
Libtcc and gwan
Libtcc and gwanLibtcc and gwan
Libtcc and gwan
 
Python andselenium
Python andseleniumPython andselenium
Python andselenium
 
To become Open Source Contributor
To become Open Source ContributorTo become Open Source Contributor
To become Open Source Contributor
 
Hide method
Hide methodHide method
Hide method
 
Facebook fql and tweepy
Facebook fql and tweepyFacebook fql and tweepy
Facebook fql and tweepy
 
Libcloud
LibcloudLibcloud
Libcloud
 
Elastic webservice
Elastic webserviceElastic webservice
Elastic webservice
 
Cell architecture
Cell architectureCell architecture
Cell architecture
 
OpenSource Contributor
OpenSource ContributorOpenSource Contributor
OpenSource Contributor
 
Superficial mongo db
Superficial mongo dbSuperficial mongo db
Superficial mongo db
 
Facebook f8 seoul
Facebook f8 seoulFacebook f8 seoul
Facebook f8 seoul
 
Twitter6
Twitter6Twitter6
Twitter6
 
50 tips & tricks for mongo db developers
50 tips & tricks for mongo db developers50 tips & tricks for mongo db developers
50 tips & tricks for mongo db developers
 
Libcloud and j clouds
Libcloud and j cloudsLibcloud and j clouds
Libcloud and j clouds
 

Similar to Twitter6

Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Artur Rodrigues
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)MongoSF
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTKFrancesco Bruni
 
Go vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFGo vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFTimur Safin
 
Python postgre sql a wonderful wedding
Python postgre sql   a wonderful weddingPython postgre sql   a wonderful wedding
Python postgre sql a wonderful weddingStéphane Wirtel
 
CoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairCoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairMark
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joinedennui2342
 
Mp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMontreal Python
 
Frsa
FrsaFrsa
Frsa_111
 
WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015Fernando Daciuk
 
Advanced Web Technology ass.pdf
Advanced Web Technology ass.pdfAdvanced Web Technology ass.pdf
Advanced Web Technology ass.pdfsimenehanmut
 
Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Christian Aichinger
 
Using the following code Install Packages pip install .pdf
Using the following code Install Packages   pip install .pdfUsing the following code Install Packages   pip install .pdf
Using the following code Install Packages pip install .pdfpicscamshoppe
 

Similar to Twitter6 (20)

Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook

 
python codes
python codespython codes
python codes
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
 
Procesamiento del lenguaje natural con python
Procesamiento del lenguaje natural con pythonProcesamiento del lenguaje natural con python
Procesamiento del lenguaje natural con python
 
ECE-PYTHON.docx
ECE-PYTHON.docxECE-PYTHON.docx
ECE-PYTHON.docx
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTK
 
Go vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFGo vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoF
 
Python postgre sql a wonderful wedding
Python postgre sql   a wonderful weddingPython postgre sql   a wonderful wedding
Python postgre sql a wonderful wedding
 
CoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairCoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love Affair
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joined
 
Mp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook game
 
Frsa
FrsaFrsa
Frsa
 
WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015
 
PyLecture2 -NetworkX-
PyLecture2 -NetworkX-PyLecture2 -NetworkX-
PyLecture2 -NetworkX-
 
Python slide
Python slidePython slide
Python slide
 
Progr2
Progr2Progr2
Progr2
 
Advanced Web Technology ass.pdf
Advanced Web Technology ass.pdfAdvanced Web Technology ass.pdf
Advanced Web Technology ass.pdf
 
Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017
 
Using the following code Install Packages pip install .pdf
Using the following code Install Packages   pip install .pdfUsing the following code Install Packages   pip install .pdf
Using the following code Install Packages pip install .pdf
 
Python basic
Python basicPython basic
Python basic
 

More from DaeMyung Kang

How to use redis well
How to use redis wellHow to use redis well
How to use redis wellDaeMyung Kang
 
The easiest consistent hashing
The easiest consistent hashingThe easiest consistent hashing
The easiest consistent hashingDaeMyung Kang
 
How to name a cache key
How to name a cache keyHow to name a cache key
How to name a cache keyDaeMyung Kang
 
Integration between Filebeat and logstash
Integration between Filebeat and logstash Integration between Filebeat and logstash
Integration between Filebeat and logstash DaeMyung Kang
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advanceDaeMyung Kang
 
Massive service basic
Massive service basicMassive service basic
Massive service basicDaeMyung Kang
 
Data Engineering 101
Data Engineering 101Data Engineering 101
Data Engineering 101DaeMyung Kang
 
How To Become Better Engineer
How To Become Better EngineerHow To Become Better Engineer
How To Become Better EngineerDaeMyung Kang
 
Kafka timestamp offset_final
Kafka timestamp offset_finalKafka timestamp offset_final
Kafka timestamp offset_finalDaeMyung Kang
 
Kafka timestamp offset
Kafka timestamp offsetKafka timestamp offset
Kafka timestamp offsetDaeMyung Kang
 
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lakeDaeMyung Kang
 
webservice scaling for newbie
webservice scaling for newbiewebservice scaling for newbie
webservice scaling for newbieDaeMyung Kang
 

More from DaeMyung Kang (20)

Count min sketch
Count min sketchCount min sketch
Count min sketch
 
Redis
RedisRedis
Redis
 
Ansible
AnsibleAnsible
Ansible
 
Why GUID is needed
Why GUID is neededWhy GUID is needed
Why GUID is needed
 
How to use redis well
How to use redis wellHow to use redis well
How to use redis well
 
The easiest consistent hashing
The easiest consistent hashingThe easiest consistent hashing
The easiest consistent hashing
 
How to name a cache key
How to name a cache keyHow to name a cache key
How to name a cache key
 
Integration between Filebeat and logstash
Integration between Filebeat and logstash Integration between Filebeat and logstash
Integration between Filebeat and logstash
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advance
 
Massive service basic
Massive service basicMassive service basic
Massive service basic
 
Data Engineering 101
Data Engineering 101Data Engineering 101
Data Engineering 101
 
How To Become Better Engineer
How To Become Better EngineerHow To Become Better Engineer
How To Become Better Engineer
 
Kafka timestamp offset_final
Kafka timestamp offset_finalKafka timestamp offset_final
Kafka timestamp offset_final
 
Kafka timestamp offset
Kafka timestamp offsetKafka timestamp offset
Kafka timestamp offset
 
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lake
 
Redis acl
Redis aclRedis acl
Redis acl
 
Coffee store
Coffee storeCoffee store
Coffee store
 
Scalable webservice
Scalable webserviceScalable webservice
Scalable webservice
 
Number system
Number systemNumber system
Number system
 
webservice scaling for newbie
webservice scaling for newbiewebservice scaling for newbie
webservice scaling for newbie
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Twitter6

  • 1. Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet #2 charsyam@naver.com
  • 4. Discover New Knowledge from Existing Information
  • 5. What do #TeaParty and #JustinBieber have in common
  • 6. Tools: Pymongo, MongoDB apt-get install python-dev pip install pymongo
  • 7. Get Tweets from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo import tweepy api = tweepy.API() tweets = api.search('#JustinBieber', rpp=100) for tweet in tweets: db.foo.save(tweet.__getstate__())
  • 8. Insert TO MongoDB from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo import tweepy api = tweepy.API() for num in range(1,16): tweets = api.search('#JustinBieber', rpp=100, page=num) for tweet in tweets: db.foo.save(tweet.__getstate__())
  • 9. Count Frequency in mongo MAP map = function(){ words = this.text.split(' '); for ( i in words ){ emit({ key: words[i] }, {count: 1}); } };
  • 10. Count Frequency in mongo REDUCE reduce = function (key, values) { var count = 0; values.forEach(function (v) {count += v.count;}); return {count:count}; }
  • 11. Count Frequency in mongo EXECUTE res = db.foo.mapReduce( map, reduce, {out: "mystring"});
  • 12. Count Frequency in mongo RESULT { "_id" : { "key" : "#1000ADay" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#1000aday" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#500ADay" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#500aday" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#AutoFollow" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#Bieber" }, "value" : { "count" : 1 } }
  • 13. Get From MongoDB from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo cursor = db.mystring.find() for d in cursor: print d
  • 14. What Entities Co-Occur Most Often with #JustinBieber and #TeaParty Tweets?
  • 15. intersection import sys from sets import Set if __name__=='__main__': r1 = open( sys.argv[1] ) r2 = open( sys.argv[2] ) s1 = Set() s2 = Set() for line in r1.readlines(): key = line.split() if( len(key) > 0 ): s1.add(key[0]) for line in r2.readlines(): key = line.split() if( len(key) > 0 ): s2.add(key[0]) s3 = s1.intersection(s2) print len(s1) print len(s2) print len(s3)
  • 16. On Average, Do #JustinBieber or #TeaParty Tweets Have More Hashtags?
  • 17. Which Get Retweeted More Often: #JustinBieber or #TeaParty?
  • 18. How Much Overlap Exists Between the Entities of #TeaParty and #JustinBieber Tweet?