SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Ch1. Introduction: Hacking on
   Twitter Data
   chois79

   2011.10.15


11년	 10월	 20일	 목요일
Installing Python Development
   Tools
   ✤   python
       ✤ http://www.python.org/download

   ✤   python package manager tools
       ✤ allow to effortlessly install Python packages

       ✤ easy_install

          ✤ http://pypi.python.org/pypi/setuptools

       ✤ pip

          ✤ http://www.pip-installer.org/en/latest/installing.html

   ✤   networkx
       ✤ creating and manipulating graphs and networks

       ✤ ex) easy_install networkx or pip install networkx




11년	 10월	 20일	 목요일
Collecting and Manipulating
   Twitter Data




11년	 10월	 20일	 목요일
Tinkering with Twitter’s API(1/2)

   ✤   Setup

        ✤   easy_install twitter

        ✤   but, Twitter’s apis was updated

            ✤    http://github.com/sixohsix/twitter/issues/56

   ✤   The Minimalist Twitter API for Python is a Python API for Twitter

        ✤   Equivalent REST query

            ✤   http://search.twitter.com/trends.json

11년	 10월	 20일	 목요일
Tinkering with Twitter’s API(2/2)

  ✤   Retrieving Twitter search trends
       # ex.3
       import twitter
       twitter_api = twitter.Twitter()
       WORLD_WOE_ID = 1 # The Yahoo! Where On Earth ID for the entire world
       world_trends = twitter_api.trends._(WORLD_WOE_ID) # get back a callable
       #[ trend["name"] for trend in world_trends()[0]['trends'] ] # call the callabl
       for trend in world_trends()[0]['trends']: # call the callabl
           print trend["name"]




  ✤   Paging through Twitter search results
       # ex.4
       search_results = []
       for page in range(1,6):
           search_results.append(twitter_api.search(q="Dennis Ritchie", rpp=20, page=page))




11년	 10월	 20일	 목요일
Frequency Analysis and Lexical
   Diversity(1/5)
   ✤   Lexical diversity
        ✤   One of the most intuitive measurements that can be applied to
            unstructured text
        ✤   Expression of the number of unique tokens in the text divided by
            the total number of tokens
        >>> words = []
        >>> for t in tweets:
        ...     words += [ w for w in t.split() ]
        >>> len(words) # total words
        7238
        >>> len(set(words)) # unique words
        1636
        >>> 1.0*len(set(words))/len(words) # lexical diversity
        0.22602928985907708
        >>> 1.0*sum([ len(t.split()) for t in tweets ])/len(tweets) # avg words per tweet
        14.476000000000001


        ✤   Each tweet carries about 20 percent unique infomation

11년	 10월	 20일	 목요일
Frequency Analysis and Lexical
   Diversity(2/5)
   ✤   Frequency Analysis: Use NLTK or collections.Count
        ✤    Very simple, powerful tool
       >>> import nltk
       >>> import cPickle
       >>> words = cPickle.load(open("myData.pickle"))
       >>> freq_dist = nltk.FreqDist(words)
       >>> freq_dist.keys()[:50] # 50 most frequent tokens
       [u'snl', u'on', u'rt', u'is', u'to', u'i', u'watch', u'justin', u'@justinbieber', u'be', u'the', u'tonight', u'gonna', u'at', u'in', u'bieber', u'and', u'you',
       u'watching', u'tina', u'for', u'a', u'wait', u'fey', u'of', u'@justinbieber:', u'if', u'with', u'so', u"can't", u'who', u'great', u'it', u'going',
       u'im', u':)', u'snl...', u'2nite...', u'are', u'cant', u'dress', u'rehearsal', u'see', u'that', u'what', u'but', u'tonight!', u':d', u'2', u'will']

       >>> freq_dist.keys()[-50:] # 50 least frequent tokens
       [u'what?!', u'whens', u'where', u'while', u'white', u'whoever', u'whoooo!!!!', u'whose', u'wiating', u'wii', u'wiig', u'win...', u'wink.', u'wknd.',
        u'wohh', u'won', u'wonder', u'wondering', u'wootwoot!', u'worked', u'worth', u'xo.', u'xx', u'ya', u'ya<3miranda', u'yay', u'yay!',
       u'yau2665', u'yea', u'yea.', u'yeaa', u'yeah!', u'yeah.', u'yeahhh.', u'yes,', u'yes;)', u'yess', u'yess,', u'you!!!!!', u"you'll", u'you+snl=', u'you,'
       u'youll', u'youtube??', u'youu<3', u'youuuuu', u'yum', u'yumyum', u'~', u'xacxac'

              ✤    Frequent tokens refer to entities such as people, times, activities
              ✤    Infrequent terms amount to mostly noise

11년	 10월	 20일	 목요일
Frequency Analysis and Lexical
   Diversity(3/5)
   ✤   Extracting relationships from the tweets
        ✤   The social web is foremost the linkages between people
        ✤   One high convenient format for storing social web data is graph
        ✤   Using regular expressions to find retweets
            ✤   RT followed by a username
            ✤   via followed by a username
                >>> import re
                >>> rt_patterns = re.compile(r"(RT|via)((?:bW*@w+)+)", re.IGNORECASE)
                >>> example_tweets = ["RT @SocialWebMining Justin Bieber is on SNL 2nite. w00t?!?",
                ... "Justin Bieber is on SNL 2nite. w00t?!? (via @SocialWebMining)"]
                >>> for t in example_tweets:
                ... rt_patterns.findall(t)
                [('RT', ' @SocialWebMining')]
                [('via', ' @SocialWebMining')




11년	 10월	 20일	 목요일
Frequency Analysis and Lexical
      Diversity(4/5)
  ✤   >>> import networkx as nx                                   ✤   ...    g.add_edge(rt_source, tweet["from_user"], {"tweet_id" :
                                                                      tweet["id"]})
  ✤   >>> import re
                                                                  ✤   >>> g.number_of_nodes()
  ✤   >>> g = nx.DiGraph()
                                                                  ✤   160
  ✤   >>>
                                                                  ✤   >>> g.number_of_edges()
  ✤   >>> all_tweets = [ tweet
                                                                  ✤   125
  ✤   ...         for page in search_results
                                                                  ✤   >>> g.edges(data=True)[0]
  ✤   ...            for tweet in page["results"] ]
                                                                  ✤   (u'@ericastolte', u'bonitasworld', {'tweet_id': 11965974697L})
  ✤   >>> def get_rt_sources(tweet):
                                                                  ✤   >>> len(nx.connected_components(g.to_undirected()))
  ✤   ... rt_patterns = re.compile(r"(RT|via)((?:bW*@w+)+)",
      re.IGNORECASE)                                              ✤   37
  ✤   ...   return [ source.strip()                               ✤   >>> sorted(nx.degree(g))
  ✤   ...        for tuple in rt_patterns.findall(tweet)           ✤   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  ✤   ...          for source in tuple                            ✤   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  ✤   ...             if source not in ("RT", "via") ]            ✤   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  ✤   >>> for tweet in all_tweets:                                ✤   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  ✤   ...   rt_sources = get_rt_sources(tweet["text"])            ✤   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
  ✤   ...   if not rt_sources: continue                           ✤   2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6, 6, 9, 37]
  ✤   ...   for rt_source in rt_sources:




11년	 10월	 20일	 목요일
Frequency Analysis and Lexical
   Diversity(5/5)
   ✤   Analysis
        ✤   500 tweets
            ✤   160 users: number of nodes
                 ✤   160 users involved in retweet relationships with one another
            ✤   125 edges connected
                 ✤   1.28(160/125): some nodes are connected to more than one
                     node
            ✤   37: The graph consists of 32 subgraphs and is not fully
                connected
            ✤   The output of degree
                 ✤   node are connected to anywhere

11년	 10월	 20일	 목요일
Visualizing Tweet Graphs(1/3)

   ✤   Dot language
        ✤   Text graph description language
        ✤   Support simple way of describing graphs that both humans and
            computer programs can use
   ✤   Graphviz
        ✤   install from source: http://www.graphviz.org/
        ✤   pygraphviz
            ✤   easy_install pygraphviz
                 ✤   setup.py: library_path, include_path


11년	 10월	 20일	 목요일
Visualizing Tweet Graphs(2/3)

   ✤   Generating DOT language output
        OUT = "snl_search_results.dot"
        try:
           nx.drawing.write_dot(g, OUT)
        except ImportError, e:
           # Help for Windows users:
           # Not a general-purpose method, but representative of
           # the same output write_dot would provide for this graph
           # if installed and easy to implement
           dot = ['"%s" -> "%s" [tweet_id=%s]' % (n1, n2, g[n1][n2]['tweet_id']) 
              for n1, n2 in g.edges()]
           f = open(OUT, 'w')
           f.write('strict digraph {n%sn}' % (';n'.join(dot),))
           f.close()

   ✤   Output
        strict digraph {
        "@ericastolte" -> "bonitasworld" [tweet_id=11965974697];
        "@mpcoelho" -> "Lil_Amaral" [tweet_id=11965954427];
        "@BieberBelle123" -> "BELIEBE4EVER" [tweet_id=11966261062];
        "@BieberBelle123" -> "sabrina9451" [tweet_id=11966197327];
   ✤    }



11년	 10월	 20일	 목요일
Visualizing Tweet Graphs(3/3)

   ✤   Convert
        ✤   $circo -Tpng -Osnl_search_results snl_search_results.dot




        ✤




11년	 10월	 20일	 목요일
Closing Remarks


   ✤   Illustrated how easy it is to use Python’s interactive interpreter to
       explore and visualize Twitter data
        ✤    Feel comfortable with your Python development environment
        ✤   Spend some time with the Twitter APIs and Graphviz
            ✤   Canviz project
                 ✤   Draw Graphviz graphs on a web browser <canvas> element.




11년	 10월	 20일	 목요일

Weitere ähnliche Inhalte

Was ist angesagt?

Data mangling with mongo db the right way [pyconit 2016]
Data mangling with mongo db the right way [pyconit 2016]Data mangling with mongo db the right way [pyconit 2016]
Data mangling with mongo db the right way [pyconit 2016]Alexander Hendorf
 
Beyond php it's not (just) about the code
Beyond php   it's not (just) about the codeBeyond php   it's not (just) about the code
Beyond php it's not (just) about the codeWim Godden
 
The Ring programming language version 1.4 book - Part 12 of 30
The Ring programming language version 1.4 book - Part 12 of 30The Ring programming language version 1.4 book - Part 12 of 30
The Ring programming language version 1.4 book - Part 12 of 30Mahmoud Samir Fayed
 
Finding a lost song with Node.js and async iterators
Finding a lost song with Node.js and async iteratorsFinding a lost song with Node.js and async iterators
Finding a lost song with Node.js and async iteratorsLuciano Mammino
 
Caching and tuning fun for high scalability @ LOAD2012
Caching and tuning fun for high scalability @ LOAD2012Caching and tuning fun for high scalability @ LOAD2012
Caching and tuning fun for high scalability @ LOAD2012Wim Godden
 
Dive into kotlins coroutines
Dive into kotlins coroutinesDive into kotlins coroutines
Dive into kotlins coroutinesFreddie Wang
 
WTF Oriented Programming, com Fabio Akita
WTF Oriented Programming, com Fabio AkitaWTF Oriented Programming, com Fabio Akita
WTF Oriented Programming, com Fabio AkitaiMasters
 
Java Unicode with Live GUI Examples
Java Unicode with Live GUI ExamplesJava Unicode with Live GUI Examples
Java Unicode with Live GUI ExamplesAbdul Rahman Sherzad
 
Java Unicode with Cool GUI Examples
Java Unicode with Cool GUI ExamplesJava Unicode with Cool GUI Examples
Java Unicode with Cool GUI ExamplesOXUS 20
 
The Ring programming language version 1.3 book - Part 35 of 88
The Ring programming language version 1.3 book - Part 35 of 88The Ring programming language version 1.3 book - Part 35 of 88
The Ring programming language version 1.3 book - Part 35 of 88Mahmoud Samir Fayed
 
Линзы - комбинаторная манипуляция данными Александр Гранин Dev2Dev v2.0 30.05...
Линзы - комбинаторная манипуляция данными Александр Гранин Dev2Dev v2.0 30.05...Линзы - комбинаторная манипуляция данными Александр Гранин Dev2Dev v2.0 30.05...
Линзы - комбинаторная манипуляция данными Александр Гранин Dev2Dev v2.0 30.05...Dev2Dev
 
RではじめるTwitter解析
RではじめるTwitter解析RではじめるTwitter解析
RではじめるTwitter解析Takeshi Arabiki
 
Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Christian Aichinger
 
MongoDB Days Silicon Valley: Data Analysis and MapReduce with MongoDB
MongoDB Days Silicon Valley: Data Analysis and MapReduce with MongoDBMongoDB Days Silicon Valley: Data Analysis and MapReduce with MongoDB
MongoDB Days Silicon Valley: Data Analysis and MapReduce with MongoDBMongoDB
 
twitteRで快適Rライフ!
twitteRで快適Rライフ!twitteRで快適Rライフ!
twitteRで快適Rライフ!Takeshi Arabiki
 
Fewer cables
Fewer cablesFewer cables
Fewer cablesacme
 
Introduction to Gremlin
Introduction to GremlinIntroduction to Gremlin
Introduction to GremlinMax De Marzi
 

Was ist angesagt? (20)

Data mangling with mongo db the right way [pyconit 2016]
Data mangling with mongo db the right way [pyconit 2016]Data mangling with mongo db the right way [pyconit 2016]
Data mangling with mongo db the right way [pyconit 2016]
 
Beyond php it's not (just) about the code
Beyond php   it's not (just) about the codeBeyond php   it's not (just) about the code
Beyond php it's not (just) about the code
 
Poly-paradigm Java
Poly-paradigm JavaPoly-paradigm Java
Poly-paradigm Java
 
The Ring programming language version 1.4 book - Part 12 of 30
The Ring programming language version 1.4 book - Part 12 of 30The Ring programming language version 1.4 book - Part 12 of 30
The Ring programming language version 1.4 book - Part 12 of 30
 
Finding a lost song with Node.js and async iterators
Finding a lost song with Node.js and async iteratorsFinding a lost song with Node.js and async iterators
Finding a lost song with Node.js and async iterators
 
Caching and tuning fun for high scalability @ LOAD2012
Caching and tuning fun for high scalability @ LOAD2012Caching and tuning fun for high scalability @ LOAD2012
Caching and tuning fun for high scalability @ LOAD2012
 
Dive into kotlins coroutines
Dive into kotlins coroutinesDive into kotlins coroutines
Dive into kotlins coroutines
 
WTF Oriented Programming, com Fabio Akita
WTF Oriented Programming, com Fabio AkitaWTF Oriented Programming, com Fabio Akita
WTF Oriented Programming, com Fabio Akita
 
Kotlin coroutines
Kotlin coroutines Kotlin coroutines
Kotlin coroutines
 
Java Unicode with Live GUI Examples
Java Unicode with Live GUI ExamplesJava Unicode with Live GUI Examples
Java Unicode with Live GUI Examples
 
Java Unicode with Cool GUI Examples
Java Unicode with Cool GUI ExamplesJava Unicode with Cool GUI Examples
Java Unicode with Cool GUI Examples
 
The Ring programming language version 1.3 book - Part 35 of 88
The Ring programming language version 1.3 book - Part 35 of 88The Ring programming language version 1.3 book - Part 35 of 88
The Ring programming language version 1.3 book - Part 35 of 88
 
Линзы - комбинаторная манипуляция данными Александр Гранин Dev2Dev v2.0 30.05...
Линзы - комбинаторная манипуляция данными Александр Гранин Dev2Dev v2.0 30.05...Линзы - комбинаторная манипуляция данными Александр Гранин Dev2Dev v2.0 30.05...
Линзы - комбинаторная манипуляция данными Александр Гранин Dev2Dev v2.0 30.05...
 
RではじめるTwitter解析
RではじめるTwitter解析RではじめるTwitter解析
RではじめるTwitter解析
 
Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017
 
MongoDB Days Silicon Valley: Data Analysis and MapReduce with MongoDB
MongoDB Days Silicon Valley: Data Analysis and MapReduce with MongoDBMongoDB Days Silicon Valley: Data Analysis and MapReduce with MongoDB
MongoDB Days Silicon Valley: Data Analysis and MapReduce with MongoDB
 
twitteRで快適Rライフ!
twitteRで快適Rライフ!twitteRで快適Rライフ!
twitteRで快適Rライフ!
 
Fewer cables
Fewer cablesFewer cables
Fewer cables
 
Data exchange formats
Data exchange formatsData exchange formats
Data exchange formats
 
Introduction to Gremlin
Introduction to GremlinIntroduction to Gremlin
Introduction to Gremlin
 

Andere mochten auch

Abstract factory petterns
Abstract factory petternsAbstract factory petterns
Abstract factory petternsHyeonSeok Choi
 
Elastic search 클러스터관리
Elastic search 클러스터관리Elastic search 클러스터관리
Elastic search 클러스터관리HyeonSeok Choi
 
7가지 동시성 모델 - 데이터 병렬성
7가지 동시성 모델 - 데이터 병렬성7가지 동시성 모델 - 데이터 병렬성
7가지 동시성 모델 - 데이터 병렬성HyeonSeok Choi
 
HTTP 완벽가이드 1장.
HTTP 완벽가이드 1장.HTTP 완벽가이드 1장.
HTTP 완벽가이드 1장.HyeonSeok Choi
 
To become Open Source Contributor
To become Open Source ContributorTo become Open Source Contributor
To become Open Source ContributorDaeMyung Kang
 
프로그래머로 사는 법 Ch6
프로그래머로 사는 법 Ch6프로그래머로 사는 법 Ch6
프로그래머로 사는 법 Ch6HyeonSeok Choi
 
프로그래머로 사는 법 Ch1
프로그래머로 사는 법 Ch1프로그래머로 사는 법 Ch1
프로그래머로 사는 법 Ch1HyeonSeok Choi
 
프로그래머로사는법 Ch10
프로그래머로사는법 Ch10프로그래머로사는법 Ch10
프로그래머로사는법 Ch10HyeonSeok Choi
 
자바 병렬 프로그래밍 ch9
자바 병렬 프로그래밍 ch9자바 병렬 프로그래밍 ch9
자바 병렬 프로그래밍 ch9HyeonSeok Choi
 
서버인프라를지탱하는기술3_2_3
서버인프라를지탱하는기술3_2_3서버인프라를지탱하는기술3_2_3
서버인프라를지탱하는기술3_2_3HyeonSeok Choi
 
Refactoring 메소드 호출의 단순화
Refactoring 메소드 호출의 단순화Refactoring 메소드 호출의 단순화
Refactoring 메소드 호출의 단순화HyeonSeok Choi
 
CODE Ch.21 버스에 올라 탑시다
CODE Ch.21 버스에 올라 탑시다CODE Ch.21 버스에 올라 탑시다
CODE Ch.21 버스에 올라 탑시다HyeonSeok Choi
 
Domain driven design ch9
Domain driven design ch9Domain driven design ch9
Domain driven design ch9HyeonSeok Choi
 

Andere mochten auch (20)

Abstract factory petterns
Abstract factory petternsAbstract factory petterns
Abstract factory petterns
 
MutiCore 19-20
MutiCore 19-20MutiCore 19-20
MutiCore 19-20
 
Elastic search 클러스터관리
Elastic search 클러스터관리Elastic search 클러스터관리
Elastic search 클러스터관리
 
7가지 동시성 모델 - 데이터 병렬성
7가지 동시성 모델 - 데이터 병렬성7가지 동시성 모델 - 데이터 병렬성
7가지 동시성 모델 - 데이터 병렬성
 
Clean code Chapter.2
Clean code Chapter.2Clean code Chapter.2
Clean code Chapter.2
 
Chean code chapter 1
Chean code chapter 1Chean code chapter 1
Chean code chapter 1
 
HTTP 완벽가이드 1장.
HTTP 완벽가이드 1장.HTTP 완벽가이드 1장.
HTTP 완벽가이드 1장.
 
함수적 사고 2장
함수적 사고 2장함수적 사고 2장
함수적 사고 2장
 
Ooa&d
Ooa&dOoa&d
Ooa&d
 
To become Open Source Contributor
To become Open Source ContributorTo become Open Source Contributor
To become Open Source Contributor
 
프로그래머로 사는 법 Ch6
프로그래머로 사는 법 Ch6프로그래머로 사는 법 Ch6
프로그래머로 사는 법 Ch6
 
Clean code ch15
Clean code ch15Clean code ch15
Clean code ch15
 
프로그래머로 사는 법 Ch1
프로그래머로 사는 법 Ch1프로그래머로 사는 법 Ch1
프로그래머로 사는 법 Ch1
 
프로그래머로사는법 Ch10
프로그래머로사는법 Ch10프로그래머로사는법 Ch10
프로그래머로사는법 Ch10
 
자바 병렬 프로그래밍 ch9
자바 병렬 프로그래밍 ch9자바 병렬 프로그래밍 ch9
자바 병렬 프로그래밍 ch9
 
C++ api design 품질
C++ api design 품질C++ api design 품질
C++ api design 품질
 
서버인프라를지탱하는기술3_2_3
서버인프라를지탱하는기술3_2_3서버인프라를지탱하는기술3_2_3
서버인프라를지탱하는기술3_2_3
 
Refactoring 메소드 호출의 단순화
Refactoring 메소드 호출의 단순화Refactoring 메소드 호출의 단순화
Refactoring 메소드 호출의 단순화
 
CODE Ch.21 버스에 올라 탑시다
CODE Ch.21 버스에 올라 탑시다CODE Ch.21 버스에 올라 탑시다
CODE Ch.21 버스에 올라 탑시다
 
Domain driven design ch9
Domain driven design ch9Domain driven design ch9
Domain driven design ch9
 

Ähnlich wie Mining the social web ch1

Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insightDigital Reasoning
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightMatthew Russell
 
My First Rails Plugin - Usertext
My First Rails Plugin - UsertextMy First Rails Plugin - Usertext
My First Rails Plugin - Usertextfrankieroberto
 
pa-pe-pi-po-pure Python Text Processing
pa-pe-pi-po-pure Python Text Processingpa-pe-pi-po-pure Python Text Processing
pa-pe-pi-po-pure Python Text ProcessingRodrigo Senra
 
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기Heejong Ahn
 
Mapping Online Publics (Part 2)
Mapping Online Publics (Part 2)Mapping Online Publics (Part 2)
Mapping Online Publics (Part 2)Axel Bruns
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine LearningTuri, Inc.
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow규영 허
 
Facebook Sentiment Analysis - What is Facebook Saying about Nintendo?
Facebook Sentiment Analysis - What is Facebook Saying about Nintendo?Facebook Sentiment Analysis - What is Facebook Saying about Nintendo?
Facebook Sentiment Analysis - What is Facebook Saying about Nintendo?Gregory Zapata
 
Python Fundamentals - Basic
Python Fundamentals - BasicPython Fundamentals - Basic
Python Fundamentals - BasicWei-Yuan Chang
 
Mining the Geo Needles in the Social Haystack
Mining the Geo Needles in the Social HaystackMining the Geo Needles in the Social Haystack
Mining the Geo Needles in the Social HaystackMatthew Russell
 
Python for High School Programmers
Python for High School ProgrammersPython for High School Programmers
Python for High School ProgrammersSiva Arunachalam
 
"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)Portland R User Group
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)Hansol Kang
 
Os Fetterupdated
Os FetterupdatedOs Fetterupdated
Os Fetterupdatedoscon2007
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
 
Helvetia
HelvetiaHelvetia
HelvetiaESUG
 
Text Mining of Twitter in Data Mining
Text Mining of Twitter in Data MiningText Mining of Twitter in Data Mining
Text Mining of Twitter in Data MiningMeghaj Mallick
 
The Dynamic Language is not Enough
The Dynamic Language is not EnoughThe Dynamic Language is not Enough
The Dynamic Language is not EnoughLukas Renggli
 

Ähnlich wie Mining the social web ch1 (20)

Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insight
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and Insight
 
My First Rails Plugin - Usertext
My First Rails Plugin - UsertextMy First Rails Plugin - Usertext
My First Rails Plugin - Usertext
 
pa-pe-pi-po-pure Python Text Processing
pa-pe-pi-po-pure Python Text Processingpa-pe-pi-po-pure Python Text Processing
pa-pe-pi-po-pure Python Text Processing
 
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
 
Mapping Online Publics (Part 2)
Mapping Online Publics (Part 2)Mapping Online Publics (Part 2)
Mapping Online Publics (Part 2)
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine Learning
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
Facebook Sentiment Analysis - What is Facebook Saying about Nintendo?
Facebook Sentiment Analysis - What is Facebook Saying about Nintendo?Facebook Sentiment Analysis - What is Facebook Saying about Nintendo?
Facebook Sentiment Analysis - What is Facebook Saying about Nintendo?
 
Python Fundamentals - Basic
Python Fundamentals - BasicPython Fundamentals - Basic
Python Fundamentals - Basic
 
Mining the Geo Needles in the Social Haystack
Mining the Geo Needles in the Social HaystackMining the Geo Needles in the Social Haystack
Mining the Geo Needles in the Social Haystack
 
Python for High School Programmers
Python for High School ProgrammersPython for High School Programmers
Python for High School Programmers
 
"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 
Os Fetterupdated
Os FetterupdatedOs Fetterupdated
Os Fetterupdated
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdf
 
Helvetia
HelvetiaHelvetia
Helvetia
 
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
 
Text Mining of Twitter in Data Mining
Text Mining of Twitter in Data MiningText Mining of Twitter in Data Mining
Text Mining of Twitter in Data Mining
 
The Dynamic Language is not Enough
The Dynamic Language is not EnoughThe Dynamic Language is not Enough
The Dynamic Language is not Enough
 

Mehr von HyeonSeok Choi

밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05HyeonSeok Choi
 
밑바닥부터시작하는딥러닝 Ch2
밑바닥부터시작하는딥러닝 Ch2밑바닥부터시작하는딥러닝 Ch2
밑바닥부터시작하는딥러닝 Ch2HyeonSeok Choi
 
프로그래머를위한선형대수학1.2
프로그래머를위한선형대수학1.2프로그래머를위한선형대수학1.2
프로그래머를위한선형대수학1.2HyeonSeok Choi
 
알고리즘 중심의 머신러닝 가이드 Ch04
알고리즘 중심의 머신러닝 가이드 Ch04알고리즘 중심의 머신러닝 가이드 Ch04
알고리즘 중심의 머신러닝 가이드 Ch04HyeonSeok Choi
 
딥러닝 제대로시작하기 Ch04
딥러닝 제대로시작하기 Ch04딥러닝 제대로시작하기 Ch04
딥러닝 제대로시작하기 Ch04HyeonSeok Choi
 
밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05HyeonSeok Choi
 
7가지 동시성 모델 4장
7가지 동시성 모델 4장7가지 동시성 모델 4장
7가지 동시성 모델 4장HyeonSeok Choi
 
실무로 배우는 시스템 성능 최적화 Ch8
실무로 배우는 시스템 성능 최적화 Ch8실무로 배우는 시스템 성능 최적화 Ch8
실무로 배우는 시스템 성능 최적화 Ch8HyeonSeok Choi
 
실무로 배우는 시스템 성능 최적화 Ch7
실무로 배우는 시스템 성능 최적화 Ch7실무로 배우는 시스템 성능 최적화 Ch7
실무로 배우는 시스템 성능 최적화 Ch7HyeonSeok Choi
 
실무로 배우는 시스템 성능 최적화 Ch6
실무로 배우는 시스템 성능 최적화 Ch6실무로 배우는 시스템 성능 최적화 Ch6
실무로 배우는 시스템 성능 최적화 Ch6HyeonSeok Choi
 
Logstash, ElasticSearch, Kibana
Logstash, ElasticSearch, KibanaLogstash, ElasticSearch, Kibana
Logstash, ElasticSearch, KibanaHyeonSeok Choi
 
실무로배우는시스템성능최적화 Ch1
실무로배우는시스템성능최적화 Ch1실무로배우는시스템성능최적화 Ch1
실무로배우는시스템성능최적화 Ch1HyeonSeok Choi
 
HTTP 완벽가이드 21장
HTTP 완벽가이드 21장HTTP 완벽가이드 21장
HTTP 완벽가이드 21장HyeonSeok Choi
 
HTTP 완벽가이드 16장
HTTP 완벽가이드 16장HTTP 완벽가이드 16장
HTTP 완벽가이드 16장HyeonSeok Choi
 
HTTP 완벽가이드 6장.
HTTP 완벽가이드 6장.HTTP 완벽가이드 6장.
HTTP 완벽가이드 6장.HyeonSeok Choi
 

Mehr von HyeonSeok Choi (20)

밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05
 
밑바닥부터시작하는딥러닝 Ch2
밑바닥부터시작하는딥러닝 Ch2밑바닥부터시작하는딥러닝 Ch2
밑바닥부터시작하는딥러닝 Ch2
 
프로그래머를위한선형대수학1.2
프로그래머를위한선형대수학1.2프로그래머를위한선형대수학1.2
프로그래머를위한선형대수학1.2
 
알고리즘 중심의 머신러닝 가이드 Ch04
알고리즘 중심의 머신러닝 가이드 Ch04알고리즘 중심의 머신러닝 가이드 Ch04
알고리즘 중심의 머신러닝 가이드 Ch04
 
딥러닝 제대로시작하기 Ch04
딥러닝 제대로시작하기 Ch04딥러닝 제대로시작하기 Ch04
딥러닝 제대로시작하기 Ch04
 
밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05
 
7가지 동시성 모델 4장
7가지 동시성 모델 4장7가지 동시성 모델 4장
7가지 동시성 모델 4장
 
Bounded Context
Bounded ContextBounded Context
Bounded Context
 
DDD Repository
DDD RepositoryDDD Repository
DDD Repository
 
DDD Start Ch#3
DDD Start Ch#3DDD Start Ch#3
DDD Start Ch#3
 
실무로 배우는 시스템 성능 최적화 Ch8
실무로 배우는 시스템 성능 최적화 Ch8실무로 배우는 시스템 성능 최적화 Ch8
실무로 배우는 시스템 성능 최적화 Ch8
 
실무로 배우는 시스템 성능 최적화 Ch7
실무로 배우는 시스템 성능 최적화 Ch7실무로 배우는 시스템 성능 최적화 Ch7
실무로 배우는 시스템 성능 최적화 Ch7
 
실무로 배우는 시스템 성능 최적화 Ch6
실무로 배우는 시스템 성능 최적화 Ch6실무로 배우는 시스템 성능 최적화 Ch6
실무로 배우는 시스템 성능 최적화 Ch6
 
Logstash, ElasticSearch, Kibana
Logstash, ElasticSearch, KibanaLogstash, ElasticSearch, Kibana
Logstash, ElasticSearch, Kibana
 
실무로배우는시스템성능최적화 Ch1
실무로배우는시스템성능최적화 Ch1실무로배우는시스템성능최적화 Ch1
실무로배우는시스템성능최적화 Ch1
 
HTTP 완벽가이드 21장
HTTP 완벽가이드 21장HTTP 완벽가이드 21장
HTTP 완벽가이드 21장
 
HTTP 완벽가이드 16장
HTTP 완벽가이드 16장HTTP 완벽가이드 16장
HTTP 완벽가이드 16장
 
HTTPS
HTTPSHTTPS
HTTPS
 
HTTP 완벽가이드 6장.
HTTP 완벽가이드 6장.HTTP 완벽가이드 6장.
HTTP 완벽가이드 6장.
 
Cluster - spark
Cluster - sparkCluster - spark
Cluster - spark
 

Kürzlich hochgeladen

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Mining the social web ch1

  • 1. Ch1. Introduction: Hacking on Twitter Data chois79 2011.10.15 11년 10월 20일 목요일
  • 2. Installing Python Development Tools ✤ python ✤ http://www.python.org/download ✤ python package manager tools ✤ allow to effortlessly install Python packages ✤ easy_install ✤ http://pypi.python.org/pypi/setuptools ✤ pip ✤ http://www.pip-installer.org/en/latest/installing.html ✤ networkx ✤ creating and manipulating graphs and networks ✤ ex) easy_install networkx or pip install networkx 11년 10월 20일 목요일
  • 3. Collecting and Manipulating Twitter Data 11년 10월 20일 목요일
  • 4. Tinkering with Twitter’s API(1/2) ✤ Setup ✤ easy_install twitter ✤ but, Twitter’s apis was updated ✤ http://github.com/sixohsix/twitter/issues/56 ✤ The Minimalist Twitter API for Python is a Python API for Twitter ✤ Equivalent REST query ✤ http://search.twitter.com/trends.json 11년 10월 20일 목요일
  • 5. Tinkering with Twitter’s API(2/2) ✤ Retrieving Twitter search trends # ex.3 import twitter twitter_api = twitter.Twitter() WORLD_WOE_ID = 1 # The Yahoo! Where On Earth ID for the entire world world_trends = twitter_api.trends._(WORLD_WOE_ID) # get back a callable #[ trend["name"] for trend in world_trends()[0]['trends'] ] # call the callabl for trend in world_trends()[0]['trends']: # call the callabl print trend["name"] ✤ Paging through Twitter search results # ex.4 search_results = [] for page in range(1,6): search_results.append(twitter_api.search(q="Dennis Ritchie", rpp=20, page=page)) 11년 10월 20일 목요일
  • 6. Frequency Analysis and Lexical Diversity(1/5) ✤ Lexical diversity ✤ One of the most intuitive measurements that can be applied to unstructured text ✤ Expression of the number of unique tokens in the text divided by the total number of tokens >>> words = [] >>> for t in tweets: ... words += [ w for w in t.split() ] >>> len(words) # total words 7238 >>> len(set(words)) # unique words 1636 >>> 1.0*len(set(words))/len(words) # lexical diversity 0.22602928985907708 >>> 1.0*sum([ len(t.split()) for t in tweets ])/len(tweets) # avg words per tweet 14.476000000000001 ✤ Each tweet carries about 20 percent unique infomation 11년 10월 20일 목요일
  • 7. Frequency Analysis and Lexical Diversity(2/5) ✤ Frequency Analysis: Use NLTK or collections.Count ✤ Very simple, powerful tool >>> import nltk >>> import cPickle >>> words = cPickle.load(open("myData.pickle")) >>> freq_dist = nltk.FreqDist(words) >>> freq_dist.keys()[:50] # 50 most frequent tokens [u'snl', u'on', u'rt', u'is', u'to', u'i', u'watch', u'justin', u'@justinbieber', u'be', u'the', u'tonight', u'gonna', u'at', u'in', u'bieber', u'and', u'you', u'watching', u'tina', u'for', u'a', u'wait', u'fey', u'of', u'@justinbieber:', u'if', u'with', u'so', u"can't", u'who', u'great', u'it', u'going', u'im', u':)', u'snl...', u'2nite...', u'are', u'cant', u'dress', u'rehearsal', u'see', u'that', u'what', u'but', u'tonight!', u':d', u'2', u'will'] >>> freq_dist.keys()[-50:] # 50 least frequent tokens [u'what?!', u'whens', u'where', u'while', u'white', u'whoever', u'whoooo!!!!', u'whose', u'wiating', u'wii', u'wiig', u'win...', u'wink.', u'wknd.', u'wohh', u'won', u'wonder', u'wondering', u'wootwoot!', u'worked', u'worth', u'xo.', u'xx', u'ya', u'ya&lt;3miranda', u'yay', u'yay!', u'yau2665', u'yea', u'yea.', u'yeaa', u'yeah!', u'yeah.', u'yeahhh.', u'yes,', u'yes;)', u'yess', u'yess,', u'you!!!!!', u"you'll", u'you+snl=', u'you,' u'youll', u'youtube??', u'youu&lt;3', u'youuuuu', u'yum', u'yumyum', u'~', u'xacxac' ✤ Frequent tokens refer to entities such as people, times, activities ✤ Infrequent terms amount to mostly noise 11년 10월 20일 목요일
  • 8. Frequency Analysis and Lexical Diversity(3/5) ✤ Extracting relationships from the tweets ✤ The social web is foremost the linkages between people ✤ One high convenient format for storing social web data is graph ✤ Using regular expressions to find retweets ✤ RT followed by a username ✤ via followed by a username >>> import re >>> rt_patterns = re.compile(r"(RT|via)((?:bW*@w+)+)", re.IGNORECASE) >>> example_tweets = ["RT @SocialWebMining Justin Bieber is on SNL 2nite. w00t?!?", ... "Justin Bieber is on SNL 2nite. w00t?!? (via @SocialWebMining)"] >>> for t in example_tweets: ... rt_patterns.findall(t) [('RT', ' @SocialWebMining')] [('via', ' @SocialWebMining') 11년 10월 20일 목요일
  • 9. Frequency Analysis and Lexical Diversity(4/5) ✤ >>> import networkx as nx ✤ ... g.add_edge(rt_source, tweet["from_user"], {"tweet_id" : tweet["id"]}) ✤ >>> import re ✤ >>> g.number_of_nodes() ✤ >>> g = nx.DiGraph() ✤ 160 ✤ >>> ✤ >>> g.number_of_edges() ✤ >>> all_tweets = [ tweet ✤ 125 ✤ ... for page in search_results ✤ >>> g.edges(data=True)[0] ✤ ... for tweet in page["results"] ] ✤ (u'@ericastolte', u'bonitasworld', {'tweet_id': 11965974697L}) ✤ >>> def get_rt_sources(tweet): ✤ >>> len(nx.connected_components(g.to_undirected())) ✤ ... rt_patterns = re.compile(r"(RT|via)((?:bW*@w+)+)", re.IGNORECASE) ✤ 37 ✤ ... return [ source.strip() ✤ >>> sorted(nx.degree(g)) ✤ ... for tuple in rt_patterns.findall(tweet) ✤ [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ✤ ... for source in tuple ✤ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ✤ ... if source not in ("RT", "via") ] ✤ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ✤ >>> for tweet in all_tweets: ✤ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ✤ ... rt_sources = get_rt_sources(tweet["text"]) ✤ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ✤ ... if not rt_sources: continue ✤ 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6, 6, 9, 37] ✤ ... for rt_source in rt_sources: 11년 10월 20일 목요일
  • 10. Frequency Analysis and Lexical Diversity(5/5) ✤ Analysis ✤ 500 tweets ✤ 160 users: number of nodes ✤ 160 users involved in retweet relationships with one another ✤ 125 edges connected ✤ 1.28(160/125): some nodes are connected to more than one node ✤ 37: The graph consists of 32 subgraphs and is not fully connected ✤ The output of degree ✤ node are connected to anywhere 11년 10월 20일 목요일
  • 11. Visualizing Tweet Graphs(1/3) ✤ Dot language ✤ Text graph description language ✤ Support simple way of describing graphs that both humans and computer programs can use ✤ Graphviz ✤ install from source: http://www.graphviz.org/ ✤ pygraphviz ✤ easy_install pygraphviz ✤ setup.py: library_path, include_path 11년 10월 20일 목요일
  • 12. Visualizing Tweet Graphs(2/3) ✤ Generating DOT language output OUT = "snl_search_results.dot" try: nx.drawing.write_dot(g, OUT) except ImportError, e: # Help for Windows users: # Not a general-purpose method, but representative of # the same output write_dot would provide for this graph # if installed and easy to implement dot = ['"%s" -> "%s" [tweet_id=%s]' % (n1, n2, g[n1][n2]['tweet_id']) for n1, n2 in g.edges()] f = open(OUT, 'w') f.write('strict digraph {n%sn}' % (';n'.join(dot),)) f.close() ✤ Output strict digraph { "@ericastolte" -> "bonitasworld" [tweet_id=11965974697]; "@mpcoelho" -> "Lil_Amaral" [tweet_id=11965954427]; "@BieberBelle123" -> "BELIEBE4EVER" [tweet_id=11966261062]; "@BieberBelle123" -> "sabrina9451" [tweet_id=11966197327]; ✤ } 11년 10월 20일 목요일
  • 13. Visualizing Tweet Graphs(3/3) ✤ Convert ✤ $circo -Tpng -Osnl_search_results snl_search_results.dot ✤ 11년 10월 20일 목요일
  • 14. Closing Remarks ✤ Illustrated how easy it is to use Python’s interactive interpreter to explore and visualize Twitter data ✤ Feel comfortable with your Python development environment ✤ Spend some time with the Twitter APIs and Graphviz ✤ Canviz project ✤ Draw Graphviz graphs on a web browser <canvas> element. 11년 10월 20일 목요일