SlideShare ist ein Scribd-Unternehmen logo
1 von 93
Downloaden Sie, um offline zu lesen
Mining and Analyzing Social Media
        HICSS 45 Tutorial – Part 1
                            Dave King
                       January 4, 2012
Agenda: This is how the slides are
organized
• Part 1
  –   Introduction – Bio, Resources, Social Media
  –   Data Mining – Processes and Example
  –   Text Mining – General Processes and Example
  –   Predicting the Future – The Portmanteaus
• Part 2
  – Sentiment Analysis
  – Social Network Analysis - Introduction

                                                                            2
                   Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Biography: Dave King

                      • Currently, EVP of Product Development
                        and Management at JDA Software
                      • 30 years in enterprise package
                        software business
                      • 15 years as university professor
                      • 14 years as Co-Chair of the Internet &
                        Digital Economy Track (HICSS)
                      • Long time interest in various aspects of
                        E-Commerce & Business Intelligence
                      • Tutorial topic primarily reflects a
                        personal interest and tangentially a
                        job(s) related interest.
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Personal Experiences with
Analytics
• Taught applied statistics, math modeling & mathematical sociology
• In software R&D for 30 years
    – Optimization in the 80s
    – Natural Language Frontends
         • NLI Query & CMU Robotics Lab
    – EIS Competitive Analysis
         • Dow Jones and Reuters
         • Verity Topics
         • NewsAlert
    – InXight’s Hyperbolic Tree
    – Supply Chain Analytics
• In the case of text analysis and it’s practical application, often
  audiences have been small, bewildered, and fleeting

                              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Mining and Analytics Resources




                                                                      5
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Mining and Analytics Resources




                                                                      6
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Mining and Analytics Resources




                                                                      7
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Mining and Analytics Resources




                                                                      8
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Mining and Analytics Resources:
Web Sites, Online Books & Tutorials
•   DM/Blog -- abbottanalytics.blogspot.com
•   DM/Blog – blog.data-miners.com
•   DM/Blog -- bx.businessweek.com/data-mining/blogs
•   DM/Blog -- bytemining.com
•   DM/Blog – data-mining.alltop.com
•   DM/Blog -- dataminingblog.com
•   DMBlog – dataminingdownunder.com
•   DM/Blog -- datamining.typepad.com
•   DM/Blog -- datawrangling.com
•   DM/Blog -- timmanns.blogspot.com
•   DM/General -- kdnuggets.com
•   DM/General -- mydatamine.com
•   DM/General -- the-data-mine.com
•   DM/Online Book -- chem-eng.utoronto.ca/~datamining/dmc/data_mining_map.htm
•   DM/Tutorial -- autonlab.org/tutorials/
                                                                                    9
                           Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Mining and Analytics Resources:
Web Sites, Online Books & Tutorials
•   TA/General -- social.textanalyticsnews.com
•   TA/General -- textanalysis.info
•   TM/Blog -- blogs.sas.com/text-mining
•   TM/Blog -- lingpipe-blog.com
•   TM/Blog -- texttechnologies.com
•   TM & TA/Blog -- informationweek.com/authors/showAuthor.jhtml?authorID=1331
•   TA Tutorial -- slideshare.net/SethGrimes/text-analytics-overview-2011
•   TM & DM/Online Book -- statsoft.com/textbook/text-mining/
•   TM & DM/Tutorial -- alias-i.com/lingpipe/demos/tutorial/db/read-me.html
•   TM Tutorial -- scienceforseo.com/tutorials/text-mining-tutorial
•   TM/Wiki -- textanalytics.wikidot.com
•   SNA/Blog – iq.harvard.edu/blog/netgov/2011/10/
•   SNA/Blog – thenetworkthinkers.com
•   SNA/Blog – blog.echen.me/tag/social-network-analysis/
•   SNA/Blog – lithosphere.lithium.com/t5/user/viewprofilepage/user-id/151
•   SNA/Tutorial -- cs.stanford.edu/people/jure/icml09networks/                     10
                           Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Mining and Analytics Resources:
Web Sites, Online Books & Tutorials
•   DA/Blog – dataists.com
•   DA/Blog – drewconway.com
•   Visualization/Blog – abeautifulwww.com/
•   Visualization/Blog – benfry.com/writing/
•   Visualization/Blog -- blog.blprnt.com
•   Visualization/Blog – chrisharrison.net/index.php/visualization.com
•   Visualization/Blog – datavisualization.ch/
•   Visualization/Blog – eagereyes.com
•   Visualization/Blog – informationandvisualization.de/
•   Visualization/Blog – infosthetics.com
•   Visualization/Blog – junkcharts.typepad.com/junk_charts/
•   Visualization/Blog – neoformix.com
•   Visualization/Blog – perpetualedge.com/blog
•   Visualization/Blog – processing.org
•   Visualization/Blog – visualcomplexity.com
•   Visualization/Blog – well-formed-data.net/

                                                                                          11
                                 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Defined
                                                                      Marta Kagan




             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Defined: …Sort of …




                                                                        13
               Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Defined:
Actually, it’s 33 Definitions
1.    Media for social interaction, using highly accessible and scalable                18.    Not one thing. It’s five distinct things:
      communication techniques.                                                         19.    Digital, content-based communications based on the interactions enabled by a
2.    Various user-driven (inbound marketing) channels (e.g., Facebook, Twitter,               plethora of web technologies
      blogs, YouTube).                                                                  20.    Collection of online platforms and tools that people use to share content,
3.    Most transparent, engaging and interactive form of public relations                      profiles, opinions, insights, experiences, perspectives and media itself,
4.    What we do and say together, worldwide, to communicate in all direction at               facilitating conversations and interactions online between groups of people.
      any time, by any possible (digital) means.                                        21.    Platform/tools.
5.    New marketing tool that allows you to get to know your customers and              22.    Act of connecting on social media platforms.
      prospects in ways that were previously not possible.                              23.    How businesses join the conversation in an authentic and transparent way to
6.    Platforms that enable the interactive web by engaging users to participate in,           build relationships.
      comment on and create content as means of communicating                           24.    The notion that social media is about the technology that facilitates individuals
7.    Consists of any online platform or channel for user generated content.                   and groups of people to connect and interact, create and share.
8.    Digital content and interaction that is created by and between people.            25.    Any of a number of individual web-based applications aggregating users who
9.    Shift in how we get our information. Social media allows us to network, to find          are able to conduct one-to-one and one-to-many two-way conversations.
      people with like interests, and to meet people who can become friends or          26.    Media channel that relies on listening and conversation, as opposed to a
      customers.                                                                               monologue, to get your point across, make a connection and build a
10.   Platforms for interaction and relationships, not content and ads.                        relationship.
11.   Online platforms and locations that provide a way for people to participate in    27.    Social media is all about leveraging online tools that promote sharing and
      these conversations.                                                                     conversations, which ultimately lead to engagement with current and future
12.   People’s conversations and actions online that can be mined by advertisers               customers and influencers in your target market.
      for insights but not coerced to pass along marketing messages.                    28.    Social media: Evolution, Revolution and Contribution -by the ability of
13.   Tools, services, and communication facilitating connection between peers                 everybody to share and contribute as a publisher
      with common interests.                                                            29.    Social media is communication channels or tools used to store, aggregate,
14.   Online technologies and practices that people use to share content, opinions,            share, discuss or deliver information within online communities.
      insights, experiences, perspectives, and media themselves.                        30.    Social Media is simply another arrow to be shot in a company’s marketing
15.   Ever-growing and evolving collection of online tools and toys, platforms and             quiver.
      applications that enable all of us to interact with and share information.        31.    Social media platforms make it easier to share information–usually online.
      Increasingly, it’s both the connective tissue and neural net of the Web.          32.    Any object or tool, that connects people in dialogue or interaction — in
16.   Reflection of conversations happening every day, whether at the supermarket,             person, in print, or online.
      a bar, the train, the watercooler or the playground.                              33.     Wild, Wild West of Marketing, with brands, businesses, and organizations
17.   Online text, pictures, videos and links, shared amongst people and                       jostling with individuals to make news, friends, connections and build
      organizations.                                                                           communities in the virtual space.
                                                                                                                                                                             14
                                                          Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Defined: If a Picture isn’t
worth a 1000 words, then …




                                                                          15
                 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Defined

  Online technologies and practices
         for social interaction



      enabling the sharing of opinions, insights,
      experiences, perspectives and media itself

                                                                          16
                 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Defined: Categories




                                                                      17
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Defined:
Unanimous Agreement
                                                                      Marta Kagan




                                                                              18
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media is Huge: Users
                                                                      Marta Kagan



750 Million: Facebook

200 Million: Twitter
100 Million: LinkedIn
                                                                              19
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media is Huge!
                                                                      Marta Kagan

If Facebook
were a country,
it would be the
3 rd largest in

the world

                                                                              20
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Data:
  Research Opportunity

“Every day, Twitter
 generates more
  social network
  data than the
entire field of SNA
  possessed 10
   years ago.”

                                                                           21
                  Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media is Huge:
Usage and Content




                  Nam e        10**N         Nam e                Value
                 (Sym bol)                  (Sym bol)

              kilobyte (kB)    3       kibibyte (KiB)     210 = 1.024 × 103

              megabyte (MB)    6       mebibyte (MiB)     220 ≈ 1.049 × 106

              gigabyte (GB)    9       gibibyte (GiB)     230 ≈ 1.074 × 109

              terabyte (TB)    12      tebibyte (TiB)     240 ≈ 1.100 × 1012

              petabyte (PB)    15      pebibyte (PiB)     250 ≈ 1.126 × 1015

              exabyte (EB)     16      exbibyte (EiB)     260 ≈ 1.153 × 1018

              zettabyte (ZB)   21      zebibyte (ZiB)     270 ≈ 1.181 × 1021

              yottabyte (YB)   24      yobibyte (YiB)     280 ≈ 1.209 × 1024




                                                                               22
                 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Data:
Part of a Bigger Picture




                                                                       23
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Data:
Ways in big data is creating value

     •   Makes information
         transparent and usable at
         much higher frequency.
     •   Provides more transactional
         data in digital form, that can
         be used to improve
         performance across the
         board.
     •   Allows ever-narrower
         segmentation of customers to
         tailor products or services.
     •   Improves decision-making
         through sophisticated.
     •   Improves the development of
         the next generation of
         products and services


                                                                                    24
                           Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Data Mining: Defined


Discovering meaningful
patterns from large data
sets using pattern
recognition technologies.



                                                                      25
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Data Mining: CRISP-DM
                                                                                                             Real-World
                                                                                                               Data



                                                                                                           Data Consolidation
               Business                       Data
             Understanding                Understanding




                                                   Data
                                                Preparation
                                                                                                             Data Cleaning

      Deployment


                                                  Modeling
                                                                                                          Data Transformation




                             Evaluation
                                                                                                            Data Reduction




                                                                                                             Well-Formed
Cross-Industry Standard Process for Data Mining                                                                  Data
                                                                                                                                26
                                                 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Data Mining:
General Data Assumptions


                                                         Structured
                                                       Transformed
                                                       Well-Formed

                                                                     27
            Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Data Mining: Example




      Affinity Analysis


                                                                      28
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Data Mining: Example
1. Market Basket Analysis: Items for Sale:



        Apples               Bananas                       Cherries

2. Possible Transactions: With one item or a collection of items selected as
   the Driver or Independent Variable
        No       X       Y     No           X          Y
         1   A       B          7        C         A
         2   A       C          8        C         B
         3   A       B   C      9        C         A    B
         4   B       A         10        A B       C
         5   B       C         11        A C       B
         6   B       A   C     12        B C       A


3. Objective is to empirically determine those groups of items that occur
   frequently together in a set of transactions, producing a set of rules of the
   form X -> Y.

                              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Data Mining: Example
                                      1         1           1           1
Transaction ID    Items
                                      2         1           0           0
      1          Apple
                                      3         0           1           1
      1          Banana
                                      4         0           1           1
      1          Cherry
                                      5         1           1           0
      2          Apple                6         1           1           0
      3          Banana               7         1           0           1
      3          Cherry               8         1           1           0
      4          Banana               9         1           1           1
      4          Cherry              10         1           1           0
      5          Apple              Sum         8           8           5
      5          Banana
      6          Apple
      6          Banana
                          Standard Market Basket Measures:
      7          Apple
      7          Cherry
                          Support: Rule’s coverage (% match antecedents)
      8          Apple    N(X & Y)/ N(T)     Example: N(A & B)/ N(T) = 2/7 = 29%
      8          Banana
      9          Apple    Confidence: Rule’s predictive ability (% consequent | antecedent)
      9          Banana   N(X & Y)/ N(X)    Example: N(A & B)/ N(A) = 2/4 = 50%
      9          Cherry
     10          Apple    Lift: Predictive improvement (ratio of observed support for X&Y to support if X& Y
     10          Banana   independent -- S(XuY)/S(X)S(Y) Example: (2 x7)/(4/7)(5/7) = .7 or 70%


                                                                                                               30
                                          Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Data Mining: Example

    Rule selection usually based                                     Parameters
                                                                                     Min. Support      40%
    on minimum support & confidence                                                  Min. Confidence   75%


    No    X     Y N(XuY) N(T) S(XuY) N(X) Conf N(Y) S(X)                             S(Y)     Lift     Rule
     1   A     B     6    10   60%    8   75%    8 80%                               80%      94%       Ok
     2   A     C     3    10   30%    8   38%    5 80%                               50%      94%
     3   A     B C   2    10   20%    8   25%    4 80%                               40%      78%
     4   B     A     6    10   60%    8   75%    8 80%                               80%     117%      Ok
     5   B     C     4    10   40%    8   50%    5 80%                               50%     125%
     6   B     A C   2    10   20%    8   25%    3 80%                               30%     104%
     7   C     A     3    10   30%    5   60%    8 50%                               80%     150%
     8   C     B     4    10   40%    5   80%    8 50%                               80%     200%      Ok
     9   C     A B   2    10   20%    5   40%    6 50%                               60%     133%
    10   A B   C     2    10   20%    6   33%    5 60%                               50%     111%
    11   A C   B     2    10   20%    3   67%    8 30%                               80%     278%
    12   B C   A     2    10   20%    4   50%    8 40%                               80%     156%




                                                                                                              31
                            Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Data Mining:
Simple Example
But, what if the baskets were described in the
following manner:
   – Jane bought a handful of maraschinos and a couple of
     granny smiths.
   – Harold purchased a bag of appls and 2 bananas.
   – Bill paid for a pound of cherries but decided not to buy
     the three durians because of their odor.
How could we automate the analysis?


                    Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Data:




                                                                      33
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Social Media Data: Commonality?




                                                                      34
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Defined




Using data mining to discover patterns
     in a collection of documents
                                                                      35
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining:
CRISP-Like Processes
                                                                                                        Real-World
                                                                                                         Text Data


                                                                                                         Document
          Business
        Understanding
                                       Document
                                     Understanding
                                                                                                        Consolidation


                                           Document                                                     Establish the
                                           Preparation
                                                                                                          Corpus
 Deployment
                        Documents
                                             Modeling
                                                                                                      Corpus Refinement
                                                                                                     (Token, Stem, Stop…)


                                                                                                      Feature Selection
                        Evaluation
                                                                                                        & Weighting



                                                                                                           Term-
                                                                                                         Doc-Matrix*
                                                                                                                            36
                                            Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining Process:
    Sample Corpa
•    Brown Corpus – first million word corpus compiled in 60s at
     Brown U., 500 samples across 15 genres, each ~2000 words with
     POS tags (Lancaster-Oslo-Bergen Corpus – British equivalent)
•    Linguistic Consortium Treebanks – collections of manually
     tagged and parsed (tree structures) of sentences from a variety of
     sources (includes well-known Penn Treebank collection)
•    Reuters 21578, RCV1 & V2, TRC2 -- collections (1000s of)
     Reuter’s English & multi-lingual news stories classified into topics and
     grouped into training & test sets
•    Pang & Lee’s Sentiment Analysis – 1000 positive and 1000
     negative movie reviews
•    MEDLINE – An extensive collection of articles and abstracts
     (18M+) used in a variety of biomedical and linguistic text mining
     applications
•    WordNet® -- large lexical database of English grouped into sets of
     cognitive synonyms (synsets) and interlinked by means of
     conceptual-semantic and lexical relations.
•    20 Newsgroups -- collection of approximately 20,000 newsgroup
     documents, partitioned (nearly) evenly across 20 different
     newsgroups each representing a different topic.

                                           Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining Process:
Corpus Refinement
    Common representation of tokens within and between documents

                                                            Eliminate
    Tokenization       Normalize                                                Stemming
                                                           Stop Words


• Tokenization —Parse the text to generate terms. Sophisticated
  analyzers can also extract phrases from the text.
• Normalize — Convert them to lowercase.
• Eliminate stop words — Eliminate terms that appear very often (e.g.
  the, and, …).
• Stemming — Convert the terms into their stemmed form—remove
  plurals and different word forms (e.g. achieve, achieves, achieved –
  achiev) [note: word about synonyms – WordNet Synset]

                       Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining:
Feature Extraction & Weighting

                                Feature
                                Extraction                                    “Bag of Words, Terms 
                                                                              or Tokens”




  Vector Representation ->
  Word, Term, Token or Pairs-Triplets
  x Doc Matrix
          Token1       Token2       Token3          Token4         …
  Doc1             1            2               2              4                    Words or Tokens are
  Doc2             4            2               3              0
                                                                                    attributes and documents
  Doc3             1            1               1              0
  Doc4             1            1               1              2
                                                                                    are examples
  …



                                    Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining:
Transforming Frequencies
• Binary Frequencies: tf =1 for tf>0; otherwise 0
• Term Frequencies: tf(i,j)/Sum of tf(i,j) in Doc K
• Log Frequencies: 1 + log(tf) for tf>0; otherwise 0
• Normalized Frequencies: Divide each frequency by SQRT
  of Sum of Squares of the frequencies within the vector
  (column)
• Term Frequency–Inverse Document Frequency
    – TF * IDF
    – Inverse Document Frequency: log(N/(1+D)) where N is total
      number of docs and D is number with term


                      Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example




Listening Post is an art installation by Mark
Hansen and Ben Rubin that culls text
fragments in real time from thousands of
unrestricted Internet chat rooms, bulletin
boards and other public forums.
                                                                                       41
                              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example




                                                                      42
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example

                                                                                 sentence
                                                                                 imageid
          Blogs                                                                  feeling
                                    “I feel”                                     posttime
                                    “I’m feeling”                                postdate
                                                                                 posturl
                                                                      15-20K
                                                                                 gender
                                                                      Feelings
                                                                                 born
                                                                      Per Day
                                                                                 country
                                      Contains                                   state
 Every
                                      1 of 5000                                  city
10 Mins
                                      Pre-Determined                             lat
                                      Feelings                                   lon
                                                                                 conditions



                                                                                        43
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example

       Query                                API                                           Result
                                                                              <?xml version="1.0" ?>
http://api.wefeelfine.org
                                                                              <feelings>
:8080/                                                                       <feeling imageid="-
ShowFeelings?                                                                mZmybPrOGTZ+xukpcU7jg"
display=xml&                                                                 feeling="better"
                                                                             sentence="i feel almost 100 better
returnfields=                                                                aside from that weird sandy feeling in
Sentence                                                                     my throat"
&postdate=2010-11-25                                                         posttime="1321633467"
                                                                             postdate=2010-11-25="0"
&limit=500
                                                                             posturl="http://jenngreenleaf.blogspot.com
                                                                             /2011/11/im-coming-down-with-cold-or-
                                                                             am-i.html"
                                                                             gender="0" country="united states"
                                                                             state="maine" city="richmond"
                                                                             lat="44.091522" lon="-69.801787"
                                                                             conditions="4" />
                                                                             …

                                                                                                               44
                            Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example

    •   i'm done believing you don't know what i'm feeling
    •   i feel so out of place
    •   i'm feeling healthy
    •   i never feel down when i'm with her
    •   i love the feeling
    •   i feel like i've been run over by a truck
    •   i feel so positive today
    •   i feel like a poor man's pin up girl

                                                                            45
                   Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example

  •   Input String (128925 chars; 24282 spaces)
       – "i have found to be helpful especially during those times when i am feeling
         discouragedni have a 50km commute and just the lack of the sense of freedom that
         driving brings just leaves me feeling scaredni seem to be feeling better mostly…"
  •   Tokenize (26465 tokens)
       – ['i', ', 'have', 'found', 'to', 'be', 'helpful', 'especially', 'during', 'those', 'times', 'when', 'i',
         'am', 'feeling', 'discouraged', 'i', 'have', 'a', '50km', 'commute', 'and', 'just', 'the', 'lack',
         'of', 'the', 'sense', 'of', 'freedom', 'that', 'driving', 'brings', 'just', 'leaves', 'me', 'feeling',
         'scared', 'i', 'feel', 'noone', 'know', 'if', 'you', 'were', 'me', 'you', 'will', 'feel', 'the', 'same',
         'way‘, …]
  •   Set of Tokens (3045 distinct tokens)
       – ["'", "'believe", "'d", "'en", "'encoding", "'feedlinks", "'forever", "'gets", "'http",
         "'ismobile", "'isprivate", "'item", "'languagedirection", "'ll", "'locale", "'ltr", "'m",
         "'mefaked", "'mobileclass", "'mr", "'no", "'okay", "'on", "'pagetitle", "'pagetype", "'re",
         "'s", "'t", "'toned", "'url", "'us", "'utf", "'ve", "'yes", '0', '034', '039', '0aeverytime', '0d',
         '10', '100', '101',…]


                                Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example


Corpus                 Word Length Sentence Length Lexical Diversity
We Feel Fine                     4              17                 8
Gutenberg Corpus
Austen-persuasion.txt            4              23                16
Bible-kjv.txt                    4              33                79
Blake-poems.txt                  4              18                 5
Carroll-alice.txt                4              16                12
Melville-moby.txt                4              24                15
Milton-paradise.txt              4              52                15
Shakespeare-caesar.txt           4              12                 8
Shakespeare-hamlet.txt           4              13                 7




                                            Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example

   • Eliminate Stopwords (175 words - 'a', 'about', 'above', 'after', …)
       – Set of tokens (12827) with stopwords eliminated ['ab', 'abit', 'able', 'abs',
         'absolute', 'absolutely', 'absorb', 'abuse', 'accomplished',
         'accomplishment', 'achieve', 'achieved', 'across', 'acted', 'action',
         'activities', 'activity', 'actually', 'acura', 'add', …]
       – Content (11896 or 45% of tokens not stopwords – 4053 with tokens
         starting with apostrophes and #s eliminated )
   • Stemming
       – Stemmed tokens (11896) ['abdomen', 'abdul', 'abil', 'abl', 'abrupt', 'absolut',
         'abstract', 'academ', 'accept', 'accid', 'accomplish', 'accur', 'accus', 'accustom',
         'achi', 'achiev', 'acknowledg', 'across', 'action', 'activ‘…]
       – Set of tokens in stemmed content(2283) ['abdomen', 'abdul', 'abil', 'abl', 'abrupt',
         'absolut', 'abstract', 'academ', 'accept', 'accid', 'accomplish', 'accur', 'accus',
         'accustom', 'achi', 'achiev',…]


                          Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example




             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example

                                                             Document-Term Matrix
        Sum              416     94    90      89           83     80      80         76          76          75 …          16          16           16    16     16       16     16      16      16
Sum     WeFeel      like     know time    go        think     better way        get        good        love      …   hear        didn        place     almost comfort everyonsinc    babi    actual
      3 comment1           0      0     1      0            0       0      0          0           0            0            0           0            0       0     0        0      0       0        0
      0 comment2           0      0     0      0            0       0      0          0           0            0            0           0            0       0     0        0      0       0        0
      2 comment3           0      1     0      0            0       0      1          0           0            0            0           0            0       0     0        0      0       0        0
      1 comment4           0      0     0      0            0       1      0          0           0            0            0           0            0       0     0        0      0       0        0
      1 comment5           1      0     0      0            0       0      0          0           0            0            0           0            0       0     0        0      0       0        0
      2 comment6           0      1     0      0            0       0      0          0           0            0            0           0            0       0     0        0      0       0        0
      7 comment7           1      1     0      0            0       0      0          1           0            0            0           0            0       0     0        0      0       0        0
      7 comment8           2      1     0      1            0       0      0          0           0            0            0           0            0       0     0        0      0       0        0
      2 comment9           1      0     0      0            0       0      0          0           0            0            0           0            0       0     0        0      0       0        0
      6 comment10          0      0     2      0            0       0      0          1           0            0            0           0            0       0     0        0      0       0        0
      …           …
      2 comment1490        1      0     0      0            0      0       0          0           0           0             0           0            0      0       0       0       0       0      0
      2 comment1491        0      0     0      0            0      0       1          0           0           0             0           0            0      0       0       0       0       0      0
      6 comment1492        0      1     0      0            0      0       1          0           0           0             0           0            0      0       0       0       0       0      0
      3 comment1493        1      0     0      0            0      0       0          0           0           0             0           0            0      0       0       0       0       0      0
      4 comment1494        1      0     0      0            0      0       0          0           0           0             0           0            0      0       0       0       0       0      0
      4 comment1495        2      0     0      0            0      0       0          0           0           0             0           0            0      0       0       0       0       0      0
      1 comment1496        0      0     0      0            0      0       0          0           0           0             0           0            0      0       0       0       0       0      0
      2 comment1497        0      0     0      0            0      0       0          0           0           0             0           0            0      0       0       0       0       0      0
      2 comment1498        1      0     0      0            0      0       0          0           0           0             0           0            0      0       0       0       0       0      0
      3 comment1499        1      0     0      1            0      0       0          0           0           0             0           0            0      0       0       0       0       0      0




                                                                                                                                                                                                    50
                                                             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Text Mining: Simple Example




   Madness                   Murmerings                               Montage




    Mobs                         Metrics                              Mounds
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction


                               Collective, macroscopic
                               trends which can be
                               scientifically inferred by
                               harnessing publicly
                               accessible data from
                               the Internet.

                                                                      52
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Characteristics


                                             Public
                                             Practical
                                             Big

                                                                       53
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Sources

                               Easily accessible digital traces:
                                                   What we surf
                                                   Whom we “friend”
                                                   What we say
                                                   Where we go
                                                   What we buy
                                                   How we play

                                                                       54
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Sample Studies




                                                                      55
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Sample Studies



                                             Infodemiology
                                               Nowcasting
                                              Culturomics




                                                                      56
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology

                       Information + Epidemiology:

                       Science of distribution and
                       determinants of information
                       in an electronic medium,
                       specifically the Internet, or
                       in a population, with the
                       ultimate aim to inform public
                       health and public policy
                       Coined by Gunther Eysenbach, Univ. of Toronto

                                                                       57
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
A Major Application - Practical




                                                                       58
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
     A Major Application - Practical
Vi




         Regional, Weekly Syndromic Surveillance

                                                                            59
                   Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
An Alternative Approach




           Text Mining of Worldwide Newswires, Web Sites
                     and Various Offline Reports

                                                                          60
                 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Aggregate Search Data

                             Monitoring and analyzing
                             queries from Internet search
                             engines or peoples' status
                             updates on microblogs for
                             syndromic surveillance to
                             predict disease outbreaks


                                                                      61
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Aggregate Search Data




                                                                      62
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Aggregate Search Data




                                                                      63
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Aggregate Search Data


  Dependent               Dependent                                Traditional,       Aggregate
  Variable at             Variable at                              Publicly           Search
  Time t                  Time t - n                               Available          Index or
  (Standard     = b0 + b1 (Standard + b2                           Explanatory + b3   Social    +e
  Publicly                Publicly                                 Variable           Media
  Available               Available                                                   Freq.
  Measure)                Measure)                                                    Count



                Standard Linear Prediction Model


                                                                                             64
                     Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Aggregate Search Data
     “Detecting Influenza Epidemics Using Search
     Engine Query Data” (Ginsberg et. al.), 2/19/09

     • Aggregating historical logs of search queries
       from 2003-2008, computing weekly time series
     • Logit(P) = b0 + b1 * logit(Q) + e
        – P – percentage of ILI physician visits
        – Q – query fraction 45 highest influenza queries
     • r is between .80-.96 for 9 regions

                                                                          65
                 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Aggregate Search Data




       http://www.google.org/flutrends/about/how.html



                                                                           66
                  Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Aggregate Search Data




                                                                      67
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
A Similar Application




        http://www.google.org/denguetrends/
                                                                       68
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Tweets




                                                                       ?




                                                                           69
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Tweets




                                                                       70
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Tweets
      “Nowcasting Events from the Social Web
      with Statistical Learning,” Lampos and
      Cristianini, ACM IS&T, 9/11

      • Text analysis of 50M tweets for 3 regions of UK
        from 6/09-4/10 (303 days)
      • HPA weekly reports of GP consultations with ILI
        diagnosis correlated with number of “hybrid
        grams”
      • Average “r” of .911
                                                                         71
                Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
A Major Application – Text Analysis

                                                    50M Tweets
       Corpus
                                               3 Region UK, 6/09-4/10



       Corpus                                               Lower           Stop
                                     Tokens                                          Stems
      Refinement                                            Case            Words




        Feature                         1-                    2             Hybrid   N-Gram
       Selection                      Grams                 Grams           Grams     Freqs




                                                                                             72
                   Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Tweets
                                                                                  Discarded
                                                                                  when
                                                                                  n<50




     BoLasso - Bootstrap LASSO (least absolute shrinkage and selection operator
                                                                                              73
                         Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Tweets




                                                                       74
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Infodemiology
Utilizing Tweets




                                                                       75
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction:

                                               Now + Forecasting:

                                             Predicting the present
                                             by analyzing large
                                             volumes of data that
                                             can be used to
                                             "forecast" current
                                             events for which
                                             official analysis has
                                             not been released
                                                                       76
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Nowcasting
Weather Envy




             Within the next 6 hours …
                                                                      77
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction:
   Sample Studies with Search
Authors         Date (Mnth-Year) Dependent Variables         Explanatory Variables          Model                                       Results

Song, Pan, Ng             Apr-10 Weekly Hotel Bookings in Indexed Search Volumes from Log of Room Nights for Log of Search              Test various statistical models; all gave
                                 Charleston, SC           Google Trends/Insights Jan  Volumes - Charleston, Travel Charleston,          reasonable forecasts. Best fit model
                                                          2008-Aug 2009               Charleston Hotels, Charleston                     was Autoregressive Distributed Lag
                                                                                      Restaurants, Charleston Tourism                   (ADLM) with a lag period of 6 weeks.

Kholodilin,               Apr-10 Year-on-Year Growth Rate    220 Google Trend/Insights       Y-o-Y monthly URPC growth rates for 3      Query term principal components
Podstawski,                      of Monthly US Real          Search terms related to Priv    sets of regressors -- Sentiment            outperform standard Sentiment and
Sliliverstovs                    Private Consumption,        Consumption reduced to 10       (consumer sentiment and confidence);       Financial Indicators. A combination of
                                 ALFRED db of Fed Rsrv of    principal components for        Financial (short term and long term        two of the factors work best -- those
                                 St. Louis                   montly periods from Jan 2005    interest rates and S&P 500); Query         related to mobility and health care
                                                             to Dec 2009                     (combinations of principal components of   consumption.
                                                                                             query terms)
Choi, Varian              Apr-09 US Census Bureau            Google Trend/Insight query Google Trend indices for query              Simple seasonal AR models and fixed-
                                 Advance Monthly Retail      indices for categories and      subcategories related to (log values) of
                                                                                                                                    effects models that includes relevant
                                 Sales (general and          subcategories related to retail overall monthly retail trade (NAICS    Google Trend variables tend to
                                 specific) and Travel        sales (general and specifix)    categories), automotive sales, home    outperform models that exclude these
                                 (Visitor arrival in Hong    and related to Travel           sales and travel.                      variables. In some cases small gains, in
                                 Kong)                                                                                              other substantial.
McLaren,                   Q2-11 Official monthly            Google Trend/Insight query For unemployment, linear AR model           For unemployment forecasts, claimant
Shanbhogue                       unemployment data and       indexes for the term "Job    with query term, claimant count, and GfK count strongest followed by query term.
                                 housing price growth in     Seekers Allowance (JSA)" for consumer confid. as exp vars; for housing For housing prices, the query term was
                                 the UK from June 2004-Jan   unemployment and "Estate     price growth with query term, Home        much stronger than HBF and RICS data.
                                 2011                        Agents" for housing          Builders and Royal Instit. of Chartered
                                                                                          Surveyors price growth balances as exp
                                                                                          vars.




                                                                                                                                                                           78
                                                        Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction:
    Sample Studies with Social Media
Authors         Date (Mnth-Year) Dependent           Explanatory Variables                                 Model                             Results
                                 Variables
Asur,                     Mar-10 Box-office          Promotion tweets-retweets for a particular movie,     Regression of 1st weekend box     Promotional tweets are weakly
Huberman                         revenues for (24)   tweet rates for particular movie per hour, ratio of   office revenues by promotional    correlated 1st weekend revs. Tweet
                                 movies              positive to negative sentiments for the movie         tweets-retweets, by tweet rates   rates are very strongly correlated
                                                                                                           vs. Hollywood Stock Exchange      (min .9) and a stronger predictor than
                                                                                                           prices, and 2nd weekend           HSX. Finally, tweet rates are strongly
                                                                                                           revenues by tweet rates and the   correlated with 2nd weekend
                                                                                                           sentiment ratio.                  revenues and sentiments improve
                                                                                                                                             the forecasts slightly.
Gruhl, Guha,              Aug-05 Amazon Sales        Number of mentions of the book/author in over 300K Cross correlation of time series     While sales rank is a poor predictor of
Kumar, Novak,                    Rank for 2340       blogs whose postings that were maintained by IBM's for sales rank and mentions.         the change in sales rankings, a prior
Tomkins                          bestselling books   WebFountain project (over 200K postings/day)                                            spike in mentions predicts quite well
                                 in 4 month period                                                                                           a future spike in sales rank.
                                 (Jul 2004-Aug
                                 2004) and spikes
                                 in these sales
                                 ranks
Sadikov,                  Aug-09 Movie critic        Basic features that count movie references in blogs, Linear regression for weekly       Minimal correlation between
Parameswaran,                    ranking, user       count movie references taking into account ranking rankings and sales data by blog      rankings and references and
Venetis                          ranking, 2008       and indegree of the blogs where they appear,         references and sentiment.          sentiment. Strong correlation
                                 gross sales,        consider only references made within a time window                                      between references and gross sales
                                 weekly box office   before or after a movie release date, features that                                     but week with sentiment. Strongest
                                 sales (weeks 1-5)   consider positive sentiment; and combinations of                                        relationships with timing of
                                                     these. References based on spinn3r.com blog data                                        references in weeks after release.
                                                     set 11/07-11/08




                                                                                                                                                                            79
                                                            Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Any Guesses?




                                                                      80
             Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Idiom, a Sculpture of
10s of 1000s of Books




                                                                       81
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: It comes in many
Shapes but not Sizes




    Omphalos                                                            Book Cell




                                                                                Matej Krén
                                 Gravity Mixer
                                                                                       82
               Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturnomics

                                    Culture + Genomics:

                                    Application of high-
                                    throughput data
                                    collection and analysis
                                    to the study of human
                                    culture.


                                                                       83
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturomics




                                                                  “Quantitative Analysis of Culture Using
                                                             Millions of Digitized Books,” Science, 12/16/10.
                                                                                                     84
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturomics 2.0




        http://www.youtube.com/watch?v=61qn7S9NCOs


                                                                          85
                 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturomics 2.0
           Culturomics 2.0: Forecasting Large-Scale Human
           Behavior Using Global News Media Tone in Time
           and Space, Kalev Leetaru, 9/11

       •   The tone of real-time consciousness reflected in the media can
           be used to forecast broad social behavior.
       •   Combined three massive news archives totaling more than 100
           million articles worldwide to explore the global consciousness
           of the news media.
       •   Employs a large shared-memory supercomputer (University of
           Tennessee SGI Altix supercomputer Nautilus with 1024
           processors and 4-TB of memory)
       •   Using the tone and location of the reports, (claims to have)
           predicted the outcome of the Arab Spring and the location of Bin
           Laden within radius of 125 miles

                                                                              86
                    Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturomics 2.0
Based on Carbon Capture Report




                                                                     87
            Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturomics 2.0
Based on Carbon Capture Report




                                                                     88
            Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturomics 2.0
Features of Stories or Tweets
      • Tone/Positivity/Negativity. Ratio of + to - tone (-
        100 to 100)
      • Polarity. Emotional charge (0 to 100)
      • Activity. Intensity of "active language" (0 to 100)
      • Personalization. Degree to which the writer
        attempts to bring the reader into the fold (0 to
        100)
      • Questions/Exclamations. Tweet tone indicators of
        non-word items
      • Geocoding. Location of story content
                                                                          89
                 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturomics 2.0
   Features of Stories or Tweets


   100M Articles from the:                  Sentiment Mining,
  New York Times (1945-05)                      Geocoding,
Sum. of Wrld Brdcasts (1979-10)              Entity Extraction                                Geocoding
Google News articles (2006-11)            Nautilus Supercomputer                            Feature Scores




                                                                                             2.4 Petabyte
                                                                                           Network with over
                                                                                             10M entitles



                                                                                                               90
                                  Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturomics 2.0
Predicting Unrest




                                                                       91
              Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturomics 2.0
NY Times View of Tone
     http://contentanalysis.ichass.illinois.edu/Culturomics20/nyt-movie-
     1000x1000.gif




                                                                              92
                     Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
Prediction: Culturomics 2.0
SWB View of Tone
       http://contentanalysis.ichass.illinois.edu/Culturomics20/swb-movie-
      1000x1000.gif




                                                                              93
                     Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL

Weitere ähnliche Inhalte

Was ist angesagt?

Digital Strategy & Social Media Fact Sheet - Booz Allen
Digital Strategy & Social Media Fact Sheet - Booz AllenDigital Strategy & Social Media Fact Sheet - Booz Allen
Digital Strategy & Social Media Fact Sheet - Booz AllenSteve Radick
 
Social Media Workshop, postgraduate
Social Media Workshop, postgraduateSocial Media Workshop, postgraduate
Social Media Workshop, postgraduateChristie Barakat
 
From Monologue to Dialogue: Building Relationships the Social Way
From Monologue to Dialogue: Building Relationships the Social WayFrom Monologue to Dialogue: Building Relationships the Social Way
From Monologue to Dialogue: Building Relationships the Social WaySue Beckingham
 
Moving into Social Media
Moving into Social MediaMoving into Social Media
Moving into Social MediaJoakim Lind
 
Social Media for Small Business
Social Media for Small BusinessSocial Media for Small Business
Social Media for Small BusinessCaroline Cummings
 
Chapter 11 – web 2 review
Chapter 11 – web 2 reviewChapter 11 – web 2 review
Chapter 11 – web 2 reviewgrainne
 
IABC social media for government by Jeff Braybrook
IABC social media for government by Jeff BraybrookIABC social media for government by Jeff Braybrook
IABC social media for government by Jeff BraybrookKristine Simpson
 
Social Media For Good 19 Oct 2009
Social Media For Good 19 Oct 2009Social Media For Good 19 Oct 2009
Social Media For Good 19 Oct 2009jeffshah
 
ADLUG 2008 Web 2.0 - Library 2.0 presentation
ADLUG 2008 Web 2.0 - Library 2.0 presentationADLUG 2008 Web 2.0 - Library 2.0 presentation
ADLUG 2008 Web 2.0 - Library 2.0 presentation@CULT Srl
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...BO TRUE ACTIVITIES SL
 
The content journey from Creation to Collaboration and Engagement
The content journey from Creation to Collaboration and EngagementThe content journey from Creation to Collaboration and Engagement
The content journey from Creation to Collaboration and EngagementDheeraj Chowdhury
 
Planning building online presence for banks
Planning building online presence for banks Planning building online presence for banks
Planning building online presence for banks SMWBEIRUT
 
Social media as marketing tool
Social media as marketing toolSocial media as marketing tool
Social media as marketing toolNastiti Mayawulan
 
Preparing for Virtual Leadership
Preparing for Virtual LeadershipPreparing for Virtual Leadership
Preparing for Virtual LeadershipDan Bevarly
 
Conférence - Université Internationale du Multimédia - connecting people to b...
Conférence - Université Internationale du Multimédia - connecting people to b...Conférence - Université Internationale du Multimédia - connecting people to b...
Conférence - Université Internationale du Multimédia - connecting people to b...Erwan Le Nagard
 
Social Media Overview (Women Business Leaders)
Social Media Overview (Women Business Leaders)Social Media Overview (Women Business Leaders)
Social Media Overview (Women Business Leaders)Caroline Cummings
 
BigFoot Digital: Dramaturgical self and content marketing strategy
BigFoot Digital: Dramaturgical self and content marketing strategyBigFoot Digital: Dramaturgical self and content marketing strategy
BigFoot Digital: Dramaturgical self and content marketing strategyMelissa Hoover
 
Cio social media_25october2011_final
Cio social media_25october2011_finalCio social media_25october2011_final
Cio social media_25october2011_finaljsnare
 

Was ist angesagt? (20)

sm@jgc Session Three
sm@jgc Session Threesm@jgc Session Three
sm@jgc Session Three
 
Digital Strategy & Social Media Fact Sheet - Booz Allen
Digital Strategy & Social Media Fact Sheet - Booz AllenDigital Strategy & Social Media Fact Sheet - Booz Allen
Digital Strategy & Social Media Fact Sheet - Booz Allen
 
Social Media Workshop, postgraduate
Social Media Workshop, postgraduateSocial Media Workshop, postgraduate
Social Media Workshop, postgraduate
 
Chapter7c McHaney
Chapter7c McHaneyChapter7c McHaney
Chapter7c McHaney
 
From Monologue to Dialogue: Building Relationships the Social Way
From Monologue to Dialogue: Building Relationships the Social WayFrom Monologue to Dialogue: Building Relationships the Social Way
From Monologue to Dialogue: Building Relationships the Social Way
 
Moving into Social Media
Moving into Social MediaMoving into Social Media
Moving into Social Media
 
Social Media for Small Business
Social Media for Small BusinessSocial Media for Small Business
Social Media for Small Business
 
Chapter 11 – web 2 review
Chapter 11 – web 2 reviewChapter 11 – web 2 review
Chapter 11 – web 2 review
 
IABC social media for government by Jeff Braybrook
IABC social media for government by Jeff BraybrookIABC social media for government by Jeff Braybrook
IABC social media for government by Jeff Braybrook
 
Social Media For Good 19 Oct 2009
Social Media For Good 19 Oct 2009Social Media For Good 19 Oct 2009
Social Media For Good 19 Oct 2009
 
ADLUG 2008 Web 2.0 - Library 2.0 presentation
ADLUG 2008 Web 2.0 - Library 2.0 presentationADLUG 2008 Web 2.0 - Library 2.0 presentation
ADLUG 2008 Web 2.0 - Library 2.0 presentation
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
 
The content journey from Creation to Collaboration and Engagement
The content journey from Creation to Collaboration and EngagementThe content journey from Creation to Collaboration and Engagement
The content journey from Creation to Collaboration and Engagement
 
Planning building online presence for banks
Planning building online presence for banks Planning building online presence for banks
Planning building online presence for banks
 
Social media as marketing tool
Social media as marketing toolSocial media as marketing tool
Social media as marketing tool
 
Preparing for Virtual Leadership
Preparing for Virtual LeadershipPreparing for Virtual Leadership
Preparing for Virtual Leadership
 
Conférence - Université Internationale du Multimédia - connecting people to b...
Conférence - Université Internationale du Multimédia - connecting people to b...Conférence - Université Internationale du Multimédia - connecting people to b...
Conférence - Université Internationale du Multimédia - connecting people to b...
 
Social Media Overview (Women Business Leaders)
Social Media Overview (Women Business Leaders)Social Media Overview (Women Business Leaders)
Social Media Overview (Women Business Leaders)
 
BigFoot Digital: Dramaturgical self and content marketing strategy
BigFoot Digital: Dramaturgical self and content marketing strategyBigFoot Digital: Dramaturgical self and content marketing strategy
BigFoot Digital: Dramaturgical self and content marketing strategy
 
Cio social media_25october2011_final
Cio social media_25october2011_finalCio social media_25october2011_final
Cio social media_25october2011_final
 

Andere mochten auch

Liquidity under Basel III
Liquidity under Basel IIILiquidity under Basel III
Liquidity under Basel IIIGemmaSoklakov
 
Digital learningday webinar_slides
Digital learningday webinar_slidesDigital learningday webinar_slides
Digital learningday webinar_slidesLindsayTBellino
 
презентация Microsoft Power Point
презентация Microsoft Power Pointпрезентация Microsoft Power Point
презентация Microsoft Power Pointnatysik
 
DC WIP Campaign School Social-Media
DC WIP Campaign School Social-MediaDC WIP Campaign School Social-Media
DC WIP Campaign School Social-MediaNEA
 
Indian Pharma - Top Guns Brainstorm at BrandStorm and FFE 2015
Indian Pharma - Top Guns Brainstorm at BrandStorm and FFE 2015Indian Pharma - Top Guns Brainstorm at BrandStorm and FFE 2015
Indian Pharma - Top Guns Brainstorm at BrandStorm and FFE 2015Anup Soans
 
20120608 tsigos.v trip_social_media
20120608 tsigos.v trip_social_media20120608 tsigos.v trip_social_media
20120608 tsigos.v trip_social_mediaDimitris Tsingos
 
Accredited gold bar manufacturers
Accredited gold bar manufacturersAccredited gold bar manufacturers
Accredited gold bar manufacturersHochleitner Marine
 
1 konsep-dasar-akuntansi2
1 konsep-dasar-akuntansi21 konsep-dasar-akuntansi2
1 konsep-dasar-akuntansi2dwe3m3
 
Entreco presentation culture of collaboration
Entreco presentation   culture of collaborationEntreco presentation   culture of collaboration
Entreco presentation culture of collaborationDimitris Tsingos
 
VANZARE Apartament 3 camere Crangasi
VANZARE Apartament 3 camere CrangasiVANZARE Apartament 3 camere Crangasi
VANZARE Apartament 3 camere Crangasiemadoyle
 
Comunicare con i Motori di Ricerca senza essere fraintesi: alla scoperta del ...
Comunicare con i Motori di Ricerca senza essere fraintesi: alla scoperta del ...Comunicare con i Motori di Ricerca senza essere fraintesi: alla scoperta del ...
Comunicare con i Motori di Ricerca senza essere fraintesi: alla scoperta del ...Mamadigital
 
Elevator Up, Please!
Elevator Up, Please!Elevator Up, Please!
Elevator Up, Please!David Keener
 
Preview guide st852ifr1
Preview guide st852ifr1Preview guide st852ifr1
Preview guide st852ifr1lhghom
 
5 Táticas para Tornar seu Marketing no Instagram mais Efetivo!
5 Táticas para Tornar seu Marketing no Instagram mais Efetivo!5 Táticas para Tornar seu Marketing no Instagram mais Efetivo!
5 Táticas para Tornar seu Marketing no Instagram mais Efetivo!PR Digital Comunicação
 

Andere mochten auch (20)

Liquidity under Basel III
Liquidity under Basel IIILiquidity under Basel III
Liquidity under Basel III
 
Digital learningday webinar_slides
Digital learningday webinar_slidesDigital learningday webinar_slides
Digital learningday webinar_slides
 
презентация Microsoft Power Point
презентация Microsoft Power Pointпрезентация Microsoft Power Point
презентация Microsoft Power Point
 
P kzach
P kzachP kzach
P kzach
 
OPPI in Media
OPPI in MediaOPPI in Media
OPPI in Media
 
DC WIP Campaign School Social-Media
DC WIP Campaign School Social-MediaDC WIP Campaign School Social-Media
DC WIP Campaign School Social-Media
 
G20 YES2011 Communique
G20 YES2011 Communique G20 YES2011 Communique
G20 YES2011 Communique
 
Rails Security
Rails SecurityRails Security
Rails Security
 
Indian Pharma - Top Guns Brainstorm at BrandStorm and FFE 2015
Indian Pharma - Top Guns Brainstorm at BrandStorm and FFE 2015Indian Pharma - Top Guns Brainstorm at BrandStorm and FFE 2015
Indian Pharma - Top Guns Brainstorm at BrandStorm and FFE 2015
 
20120608 tsigos.v trip_social_media
20120608 tsigos.v trip_social_media20120608 tsigos.v trip_social_media
20120608 tsigos.v trip_social_media
 
Accredited gold bar manufacturers
Accredited gold bar manufacturersAccredited gold bar manufacturers
Accredited gold bar manufacturers
 
1 konsep-dasar-akuntansi2
1 konsep-dasar-akuntansi21 konsep-dasar-akuntansi2
1 konsep-dasar-akuntansi2
 
Pik's portfolio2011
Pik's portfolio2011Pik's portfolio2011
Pik's portfolio2011
 
Entreco presentation culture of collaboration
Entreco presentation   culture of collaborationEntreco presentation   culture of collaboration
Entreco presentation culture of collaboration
 
Encuentro 2 Espacio Digital
Encuentro 2 Espacio Digital Encuentro 2 Espacio Digital
Encuentro 2 Espacio Digital
 
VANZARE Apartament 3 camere Crangasi
VANZARE Apartament 3 camere CrangasiVANZARE Apartament 3 camere Crangasi
VANZARE Apartament 3 camere Crangasi
 
Comunicare con i Motori di Ricerca senza essere fraintesi: alla scoperta del ...
Comunicare con i Motori di Ricerca senza essere fraintesi: alla scoperta del ...Comunicare con i Motori di Ricerca senza essere fraintesi: alla scoperta del ...
Comunicare con i Motori di Ricerca senza essere fraintesi: alla scoperta del ...
 
Elevator Up, Please!
Elevator Up, Please!Elevator Up, Please!
Elevator Up, Please!
 
Preview guide st852ifr1
Preview guide st852ifr1Preview guide st852ifr1
Preview guide st852ifr1
 
5 Táticas para Tornar seu Marketing no Instagram mais Efetivo!
5 Táticas para Tornar seu Marketing no Instagram mais Efetivo!5 Táticas para Tornar seu Marketing no Instagram mais Efetivo!
5 Táticas para Tornar seu Marketing no Instagram mais Efetivo!
 

Ähnlich wie Mining and analyzing social media hicss 45 tutorial – part 1

The Science Of Social Media Marketing
The Science Of Social Media MarketingThe Science Of Social Media Marketing
The Science Of Social Media MarketingAlbano Masino
 
Using Social Media to Disseminate Research - U of T - March 28, 2012
Using Social Media to Disseminate Research - U of T - March 28, 2012Using Social Media to Disseminate Research - U of T - March 28, 2012
Using Social Media to Disseminate Research - U of T - March 28, 2012KMb Unit, York University
 
Newsroom 3.0
Newsroom 3.0Newsroom 3.0
Newsroom 3.0Edelman
 
Social curation slideshare
Social curation slideshareSocial curation slideshare
Social curation slideshareCollabor8now Ltd
 
Using social media for research & researcher development
Using social media for research & researcher developmentUsing social media for research & researcher development
Using social media for research & researcher developmentDr Sarah-Louise Quinnell
 
Leveraging Persuasive Architecture
Leveraging Persuasive ArchitectureLeveraging Persuasive Architecture
Leveraging Persuasive ArchitectureMichael Rawlins
 
Ictktn online business essentials 2012 may
Ictktn online business essentials   2012 mayIctktn online business essentials   2012 may
Ictktn online business essentials 2012 mayMargaret Gold
 
Enterprise social media for business managers
Enterprise social media for business managersEnterprise social media for business managers
Enterprise social media for business managersRimjhim Ray
 
Business Social Media - Central CT SIM Meeting
Business Social Media - Central CT SIM MeetingBusiness Social Media - Central CT SIM Meeting
Business Social Media - Central CT SIM MeetingMichael Rawlins
 
Digital Literacy Lens for the SCONUL 7 pillars model of IL
Digital Literacy Lens for the SCONUL 7 pillars model of ILDigital Literacy Lens for the SCONUL 7 pillars model of IL
Digital Literacy Lens for the SCONUL 7 pillars model of ILHelen Howard
 
empowermenttechnologies-module1-.pptx
empowermenttechnologies-module1-.pptxempowermenttechnologies-module1-.pptx
empowermenttechnologies-module1-.pptxrazielyurag
 
Empowerment Technologies - Module 1
Empowerment Technologies - Module 1Empowerment Technologies - Module 1
Empowerment Technologies - Module 1Jesus Rances
 
Social Media Hands-On Workshop - Sept 2010
Social Media Hands-On Workshop - Sept 2010Social Media Hands-On Workshop - Sept 2010
Social Media Hands-On Workshop - Sept 2010Donny Shimamoto
 
What is Social Media? Ku social media class_17 april 2012
What is Social Media? Ku social media class_17 april 2012What is Social Media? Ku social media class_17 april 2012
What is Social Media? Ku social media class_17 april 2012Binaya Guragain
 
Digital marketing workshop
Digital marketing workshopDigital marketing workshop
Digital marketing workshopAndy Lima
 
Digital Content Consumption: The Current Landscape
Digital Content Consumption: The Current LandscapeDigital Content Consumption: The Current Landscape
Digital Content Consumption: The Current LandscapeAnn Michael
 
Online eminence with Social Media & Systems of Engagement
Online eminence with Social Media & Systems of EngagementOnline eminence with Social Media & Systems of Engagement
Online eminence with Social Media & Systems of EngagementNico Chillemi
 

Ähnlich wie Mining and analyzing social media hicss 45 tutorial – part 1 (20)

The Science Of Social Media Marketing
The Science Of Social Media MarketingThe Science Of Social Media Marketing
The Science Of Social Media Marketing
 
Social Media 101, May 1, 2012
Social Media 101, May 1, 2012Social Media 101, May 1, 2012
Social Media 101, May 1, 2012
 
Using Social Media to Disseminate Research - U of T - March 28, 2012
Using Social Media to Disseminate Research - U of T - March 28, 2012Using Social Media to Disseminate Research - U of T - March 28, 2012
Using Social Media to Disseminate Research - U of T - March 28, 2012
 
Newsroom 3.0
Newsroom 3.0Newsroom 3.0
Newsroom 3.0
 
Social media 101
Social media 101Social media 101
Social media 101
 
Social curation slideshare
Social curation slideshareSocial curation slideshare
Social curation slideshare
 
Using social media for research & researcher development
Using social media for research & researcher developmentUsing social media for research & researcher development
Using social media for research & researcher development
 
Leveraging Persuasive Architecture
Leveraging Persuasive ArchitectureLeveraging Persuasive Architecture
Leveraging Persuasive Architecture
 
Ictktn online business essentials 2012 may
Ictktn online business essentials   2012 mayIctktn online business essentials   2012 may
Ictktn online business essentials 2012 may
 
Enterprise social media for business managers
Enterprise social media for business managersEnterprise social media for business managers
Enterprise social media for business managers
 
Business Social Media - Central CT SIM Meeting
Business Social Media - Central CT SIM MeetingBusiness Social Media - Central CT SIM Meeting
Business Social Media - Central CT SIM Meeting
 
Digital Literacy Lens for the SCONUL 7 pillars model of IL
Digital Literacy Lens for the SCONUL 7 pillars model of ILDigital Literacy Lens for the SCONUL 7 pillars model of IL
Digital Literacy Lens for the SCONUL 7 pillars model of IL
 
Social media 101
Social media 101Social media 101
Social media 101
 
empowermenttechnologies-module1-.pptx
empowermenttechnologies-module1-.pptxempowermenttechnologies-module1-.pptx
empowermenttechnologies-module1-.pptx
 
Empowerment Technologies - Module 1
Empowerment Technologies - Module 1Empowerment Technologies - Module 1
Empowerment Technologies - Module 1
 
Social Media Hands-On Workshop - Sept 2010
Social Media Hands-On Workshop - Sept 2010Social Media Hands-On Workshop - Sept 2010
Social Media Hands-On Workshop - Sept 2010
 
What is Social Media? Ku social media class_17 april 2012
What is Social Media? Ku social media class_17 april 2012What is Social Media? Ku social media class_17 april 2012
What is Social Media? Ku social media class_17 april 2012
 
Digital marketing workshop
Digital marketing workshopDigital marketing workshop
Digital marketing workshop
 
Digital Content Consumption: The Current Landscape
Digital Content Consumption: The Current LandscapeDigital Content Consumption: The Current Landscape
Digital Content Consumption: The Current Landscape
 
Online eminence with Social Media & Systems of Engagement
Online eminence with Social Media & Systems of EngagementOnline eminence with Social Media & Systems of Engagement
Online eminence with Social Media & Systems of Engagement
 

Mehr von Dave King

Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave kingDave King
 
Mining and analyzing social media part 1 - hicss47 tutorial - dave king
Mining and analyzing social media   part 1 - hicss47 tutorial - dave kingMining and analyzing social media   part 1 - hicss47 tutorial - dave king
Mining and analyzing social media part 1 - hicss47 tutorial - dave kingDave King
 
Mining and analyzing social media facebook w gephi - hicss47 tutorial - dav...
Mining and analyzing social media   facebook w gephi - hicss47 tutorial - dav...Mining and analyzing social media   facebook w gephi - hicss47 tutorial - dav...
Mining and analyzing social media facebook w gephi - hicss47 tutorial - dav...Dave King
 
Mining and analyzing social media bollywood w pajek - hicss47 tutorial - da...
Mining and analyzing social media   bollywood w pajek - hicss47 tutorial - da...Mining and analyzing social media   bollywood w pajek - hicss47 tutorial - da...
Mining and analyzing social media bollywood w pajek - hicss47 tutorial - da...Dave King
 
Mining and analyzing social media sample network w ora - hicss47 tutorial -...
Mining and analyzing social media   sample network w ora - hicss47 tutorial -...Mining and analyzing social media   sample network w ora - hicss47 tutorial -...
Mining and analyzing social media sample network w ora - hicss47 tutorial -...Dave King
 
Social media mining hicss 46 part 2
Social media mining   hicss 46 part 2Social media mining   hicss 46 part 2
Social media mining hicss 46 part 2Dave King
 
Social media mining hicss 46 part 1
Social media mining   hicss 46 part 1Social media mining   hicss 46 part 1
Social media mining hicss 46 part 1Dave King
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Dave King
 
Text mining and analytics v6 - p1
Text mining and analytics   v6 - p1Text mining and analytics   v6 - p1
Text mining and analytics v6 - p1Dave King
 
Text mining and analytics v6 - p2
Text mining and analytics   v6 - p2Text mining and analytics   v6 - p2
Text mining and analytics v6 - p2Dave King
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3Dave King
 
Digital Trails Dave King 1 5 10 Part 1 D3
Digital Trails   Dave King   1 5 10   Part 1 D3Digital Trails   Dave King   1 5 10   Part 1 D3
Digital Trails Dave King 1 5 10 Part 1 D3Dave King
 

Mehr von Dave King (12)

Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave king
 
Mining and analyzing social media part 1 - hicss47 tutorial - dave king
Mining and analyzing social media   part 1 - hicss47 tutorial - dave kingMining and analyzing social media   part 1 - hicss47 tutorial - dave king
Mining and analyzing social media part 1 - hicss47 tutorial - dave king
 
Mining and analyzing social media facebook w gephi - hicss47 tutorial - dav...
Mining and analyzing social media   facebook w gephi - hicss47 tutorial - dav...Mining and analyzing social media   facebook w gephi - hicss47 tutorial - dav...
Mining and analyzing social media facebook w gephi - hicss47 tutorial - dav...
 
Mining and analyzing social media bollywood w pajek - hicss47 tutorial - da...
Mining and analyzing social media   bollywood w pajek - hicss47 tutorial - da...Mining and analyzing social media   bollywood w pajek - hicss47 tutorial - da...
Mining and analyzing social media bollywood w pajek - hicss47 tutorial - da...
 
Mining and analyzing social media sample network w ora - hicss47 tutorial -...
Mining and analyzing social media   sample network w ora - hicss47 tutorial -...Mining and analyzing social media   sample network w ora - hicss47 tutorial -...
Mining and analyzing social media sample network w ora - hicss47 tutorial -...
 
Social media mining hicss 46 part 2
Social media mining   hicss 46 part 2Social media mining   hicss 46 part 2
Social media mining hicss 46 part 2
 
Social media mining hicss 46 part 1
Social media mining   hicss 46 part 1Social media mining   hicss 46 part 1
Social media mining hicss 46 part 1
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2
 
Text mining and analytics v6 - p1
Text mining and analytics   v6 - p1Text mining and analytics   v6 - p1
Text mining and analytics v6 - p1
 
Text mining and analytics v6 - p2
Text mining and analytics   v6 - p2Text mining and analytics   v6 - p2
Text mining and analytics v6 - p2
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
Digital Trails Dave King 1 5 10 Part 1 D3
Digital Trails   Dave King   1 5 10   Part 1 D3Digital Trails   Dave King   1 5 10   Part 1 D3
Digital Trails Dave King 1 5 10 Part 1 D3
 

Kürzlich hochgeladen

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Kürzlich hochgeladen (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Mining and analyzing social media hicss 45 tutorial – part 1

  • 1. Mining and Analyzing Social Media HICSS 45 Tutorial – Part 1 Dave King January 4, 2012
  • 2. Agenda: This is how the slides are organized • Part 1 – Introduction – Bio, Resources, Social Media – Data Mining – Processes and Example – Text Mining – General Processes and Example – Predicting the Future – The Portmanteaus • Part 2 – Sentiment Analysis – Social Network Analysis - Introduction 2 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 3. Biography: Dave King • Currently, EVP of Product Development and Management at JDA Software • 30 years in enterprise package software business • 15 years as university professor • 14 years as Co-Chair of the Internet & Digital Economy Track (HICSS) • Long time interest in various aspects of E-Commerce & Business Intelligence • Tutorial topic primarily reflects a personal interest and tangentially a job(s) related interest. Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 4. Personal Experiences with Analytics • Taught applied statistics, math modeling & mathematical sociology • In software R&D for 30 years – Optimization in the 80s – Natural Language Frontends • NLI Query & CMU Robotics Lab – EIS Competitive Analysis • Dow Jones and Reuters • Verity Topics • NewsAlert – InXight’s Hyperbolic Tree – Supply Chain Analytics • In the case of text analysis and it’s practical application, often audiences have been small, bewildered, and fleeting Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 5. Mining and Analytics Resources 5 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 6. Mining and Analytics Resources 6 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 7. Mining and Analytics Resources 7 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 8. Mining and Analytics Resources 8 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 9. Mining and Analytics Resources: Web Sites, Online Books & Tutorials • DM/Blog -- abbottanalytics.blogspot.com • DM/Blog – blog.data-miners.com • DM/Blog -- bx.businessweek.com/data-mining/blogs • DM/Blog -- bytemining.com • DM/Blog – data-mining.alltop.com • DM/Blog -- dataminingblog.com • DMBlog – dataminingdownunder.com • DM/Blog -- datamining.typepad.com • DM/Blog -- datawrangling.com • DM/Blog -- timmanns.blogspot.com • DM/General -- kdnuggets.com • DM/General -- mydatamine.com • DM/General -- the-data-mine.com • DM/Online Book -- chem-eng.utoronto.ca/~datamining/dmc/data_mining_map.htm • DM/Tutorial -- autonlab.org/tutorials/ 9 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 10. Mining and Analytics Resources: Web Sites, Online Books & Tutorials • TA/General -- social.textanalyticsnews.com • TA/General -- textanalysis.info • TM/Blog -- blogs.sas.com/text-mining • TM/Blog -- lingpipe-blog.com • TM/Blog -- texttechnologies.com • TM & TA/Blog -- informationweek.com/authors/showAuthor.jhtml?authorID=1331 • TA Tutorial -- slideshare.net/SethGrimes/text-analytics-overview-2011 • TM & DM/Online Book -- statsoft.com/textbook/text-mining/ • TM & DM/Tutorial -- alias-i.com/lingpipe/demos/tutorial/db/read-me.html • TM Tutorial -- scienceforseo.com/tutorials/text-mining-tutorial • TM/Wiki -- textanalytics.wikidot.com • SNA/Blog – iq.harvard.edu/blog/netgov/2011/10/ • SNA/Blog – thenetworkthinkers.com • SNA/Blog – blog.echen.me/tag/social-network-analysis/ • SNA/Blog – lithosphere.lithium.com/t5/user/viewprofilepage/user-id/151 • SNA/Tutorial -- cs.stanford.edu/people/jure/icml09networks/ 10 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 11. Mining and Analytics Resources: Web Sites, Online Books & Tutorials • DA/Blog – dataists.com • DA/Blog – drewconway.com • Visualization/Blog – abeautifulwww.com/ • Visualization/Blog – benfry.com/writing/ • Visualization/Blog -- blog.blprnt.com • Visualization/Blog – chrisharrison.net/index.php/visualization.com • Visualization/Blog – datavisualization.ch/ • Visualization/Blog – eagereyes.com • Visualization/Blog – informationandvisualization.de/ • Visualization/Blog – infosthetics.com • Visualization/Blog – junkcharts.typepad.com/junk_charts/ • Visualization/Blog – neoformix.com • Visualization/Blog – perpetualedge.com/blog • Visualization/Blog – processing.org • Visualization/Blog – visualcomplexity.com • Visualization/Blog – well-formed-data.net/ 11 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 12. Social Media Defined Marta Kagan Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 13. Social Media Defined: …Sort of … 13 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 14. Social Media Defined: Actually, it’s 33 Definitions 1. Media for social interaction, using highly accessible and scalable 18. Not one thing. It’s five distinct things: communication techniques. 19. Digital, content-based communications based on the interactions enabled by a 2. Various user-driven (inbound marketing) channels (e.g., Facebook, Twitter, plethora of web technologies blogs, YouTube). 20. Collection of online platforms and tools that people use to share content, 3. Most transparent, engaging and interactive form of public relations profiles, opinions, insights, experiences, perspectives and media itself, 4. What we do and say together, worldwide, to communicate in all direction at facilitating conversations and interactions online between groups of people. any time, by any possible (digital) means. 21. Platform/tools. 5. New marketing tool that allows you to get to know your customers and 22. Act of connecting on social media platforms. prospects in ways that were previously not possible. 23. How businesses join the conversation in an authentic and transparent way to 6. Platforms that enable the interactive web by engaging users to participate in, build relationships. comment on and create content as means of communicating 24. The notion that social media is about the technology that facilitates individuals 7. Consists of any online platform or channel for user generated content. and groups of people to connect and interact, create and share. 8. Digital content and interaction that is created by and between people. 25. Any of a number of individual web-based applications aggregating users who 9. Shift in how we get our information. Social media allows us to network, to find are able to conduct one-to-one and one-to-many two-way conversations. people with like interests, and to meet people who can become friends or 26. Media channel that relies on listening and conversation, as opposed to a customers. monologue, to get your point across, make a connection and build a 10. Platforms for interaction and relationships, not content and ads. relationship. 11. Online platforms and locations that provide a way for people to participate in 27. Social media is all about leveraging online tools that promote sharing and these conversations. conversations, which ultimately lead to engagement with current and future 12. People’s conversations and actions online that can be mined by advertisers customers and influencers in your target market. for insights but not coerced to pass along marketing messages. 28. Social media: Evolution, Revolution and Contribution -by the ability of 13. Tools, services, and communication facilitating connection between peers everybody to share and contribute as a publisher with common interests. 29. Social media is communication channels or tools used to store, aggregate, 14. Online technologies and practices that people use to share content, opinions, share, discuss or deliver information within online communities. insights, experiences, perspectives, and media themselves. 30. Social Media is simply another arrow to be shot in a company’s marketing 15. Ever-growing and evolving collection of online tools and toys, platforms and quiver. applications that enable all of us to interact with and share information. 31. Social media platforms make it easier to share information–usually online. Increasingly, it’s both the connective tissue and neural net of the Web. 32. Any object or tool, that connects people in dialogue or interaction — in 16. Reflection of conversations happening every day, whether at the supermarket, person, in print, or online. a bar, the train, the watercooler or the playground. 33. Wild, Wild West of Marketing, with brands, businesses, and organizations 17. Online text, pictures, videos and links, shared amongst people and jostling with individuals to make news, friends, connections and build organizations. communities in the virtual space. 14 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 15. Social Media Defined: If a Picture isn’t worth a 1000 words, then … 15 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 16. Social Media Defined Online technologies and practices for social interaction enabling the sharing of opinions, insights, experiences, perspectives and media itself 16 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 17. Social Media Defined: Categories 17 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 18. Social Media Defined: Unanimous Agreement Marta Kagan 18 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 19. Social Media is Huge: Users Marta Kagan 750 Million: Facebook 200 Million: Twitter 100 Million: LinkedIn 19 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 20. Social Media is Huge! Marta Kagan If Facebook were a country, it would be the 3 rd largest in the world 20 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 21. Social Media Data: Research Opportunity “Every day, Twitter generates more social network data than the entire field of SNA possessed 10 years ago.” 21 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 22. Social Media is Huge: Usage and Content Nam e 10**N Nam e Value (Sym bol) (Sym bol) kilobyte (kB) 3 kibibyte (KiB) 210 = 1.024 × 103 megabyte (MB) 6 mebibyte (MiB) 220 ≈ 1.049 × 106 gigabyte (GB) 9 gibibyte (GiB) 230 ≈ 1.074 × 109 terabyte (TB) 12 tebibyte (TiB) 240 ≈ 1.100 × 1012 petabyte (PB) 15 pebibyte (PiB) 250 ≈ 1.126 × 1015 exabyte (EB) 16 exbibyte (EiB) 260 ≈ 1.153 × 1018 zettabyte (ZB) 21 zebibyte (ZiB) 270 ≈ 1.181 × 1021 yottabyte (YB) 24 yobibyte (YiB) 280 ≈ 1.209 × 1024 22 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 23. Social Media Data: Part of a Bigger Picture 23 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 24. Social Media Data: Ways in big data is creating value • Makes information transparent and usable at much higher frequency. • Provides more transactional data in digital form, that can be used to improve performance across the board. • Allows ever-narrower segmentation of customers to tailor products or services. • Improves decision-making through sophisticated. • Improves the development of the next generation of products and services 24 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 25. Data Mining: Defined Discovering meaningful patterns from large data sets using pattern recognition technologies. 25 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 26. Data Mining: CRISP-DM Real-World Data Data Consolidation Business Data Understanding Understanding Data Preparation Data Cleaning Deployment Modeling Data Transformation Evaluation Data Reduction Well-Formed Cross-Industry Standard Process for Data Mining Data 26 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 27. Data Mining: General Data Assumptions Structured Transformed Well-Formed 27 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 28. Data Mining: Example Affinity Analysis 28 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 29. Data Mining: Example 1. Market Basket Analysis: Items for Sale: Apples Bananas Cherries 2. Possible Transactions: With one item or a collection of items selected as the Driver or Independent Variable No X Y No X Y 1 A B 7 C A 2 A C 8 C B 3 A B C 9 C A B 4 B A 10 A B C 5 B C 11 A C B 6 B A C 12 B C A 3. Objective is to empirically determine those groups of items that occur frequently together in a set of transactions, producing a set of rules of the form X -> Y. Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 30. Data Mining: Example 1 1 1 1 Transaction ID Items 2 1 0 0 1 Apple 3 0 1 1 1 Banana 4 0 1 1 1 Cherry 5 1 1 0 2 Apple 6 1 1 0 3 Banana 7 1 0 1 3 Cherry 8 1 1 0 4 Banana 9 1 1 1 4 Cherry 10 1 1 0 5 Apple Sum 8 8 5 5 Banana 6 Apple 6 Banana Standard Market Basket Measures: 7 Apple 7 Cherry Support: Rule’s coverage (% match antecedents) 8 Apple N(X & Y)/ N(T) Example: N(A & B)/ N(T) = 2/7 = 29% 8 Banana 9 Apple Confidence: Rule’s predictive ability (% consequent | antecedent) 9 Banana N(X & Y)/ N(X) Example: N(A & B)/ N(A) = 2/4 = 50% 9 Cherry 10 Apple Lift: Predictive improvement (ratio of observed support for X&Y to support if X& Y 10 Banana independent -- S(XuY)/S(X)S(Y) Example: (2 x7)/(4/7)(5/7) = .7 or 70% 30 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 31. Data Mining: Example Rule selection usually based Parameters Min. Support 40% on minimum support & confidence Min. Confidence 75% No X Y N(XuY) N(T) S(XuY) N(X) Conf N(Y) S(X) S(Y) Lift Rule 1 A B 6 10 60% 8 75% 8 80% 80% 94% Ok 2 A C 3 10 30% 8 38% 5 80% 50% 94% 3 A B C 2 10 20% 8 25% 4 80% 40% 78% 4 B A 6 10 60% 8 75% 8 80% 80% 117% Ok 5 B C 4 10 40% 8 50% 5 80% 50% 125% 6 B A C 2 10 20% 8 25% 3 80% 30% 104% 7 C A 3 10 30% 5 60% 8 50% 80% 150% 8 C B 4 10 40% 5 80% 8 50% 80% 200% Ok 9 C A B 2 10 20% 5 40% 6 50% 60% 133% 10 A B C 2 10 20% 6 33% 5 60% 50% 111% 11 A C B 2 10 20% 3 67% 8 30% 80% 278% 12 B C A 2 10 20% 4 50% 8 40% 80% 156% 31 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 32. Data Mining: Simple Example But, what if the baskets were described in the following manner: – Jane bought a handful of maraschinos and a couple of granny smiths. – Harold purchased a bag of appls and 2 bananas. – Bill paid for a pound of cherries but decided not to buy the three durians because of their odor. How could we automate the analysis? Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 33. Social Media Data: 33 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 34. Social Media Data: Commonality? 34 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 35. Text Mining: Defined Using data mining to discover patterns in a collection of documents 35 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 36. Text Mining: CRISP-Like Processes Real-World Text Data Document Business Understanding Document Understanding Consolidation Document Establish the Preparation Corpus Deployment Documents Modeling Corpus Refinement (Token, Stem, Stop…) Feature Selection Evaluation & Weighting Term- Doc-Matrix* 36 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 37. Text Mining Process: Sample Corpa • Brown Corpus – first million word corpus compiled in 60s at Brown U., 500 samples across 15 genres, each ~2000 words with POS tags (Lancaster-Oslo-Bergen Corpus – British equivalent) • Linguistic Consortium Treebanks – collections of manually tagged and parsed (tree structures) of sentences from a variety of sources (includes well-known Penn Treebank collection) • Reuters 21578, RCV1 & V2, TRC2 -- collections (1000s of) Reuter’s English & multi-lingual news stories classified into topics and grouped into training & test sets • Pang & Lee’s Sentiment Analysis – 1000 positive and 1000 negative movie reviews • MEDLINE – An extensive collection of articles and abstracts (18M+) used in a variety of biomedical and linguistic text mining applications • WordNet® -- large lexical database of English grouped into sets of cognitive synonyms (synsets) and interlinked by means of conceptual-semantic and lexical relations. • 20 Newsgroups -- collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups each representing a different topic. Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 38. Text Mining Process: Corpus Refinement Common representation of tokens within and between documents Eliminate Tokenization Normalize Stemming Stop Words • Tokenization —Parse the text to generate terms. Sophisticated analyzers can also extract phrases from the text. • Normalize — Convert them to lowercase. • Eliminate stop words — Eliminate terms that appear very often (e.g. the, and, …). • Stemming — Convert the terms into their stemmed form—remove plurals and different word forms (e.g. achieve, achieves, achieved – achiev) [note: word about synonyms – WordNet Synset] Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 39. Text Mining: Feature Extraction & Weighting Feature Extraction “Bag of Words, Terms  or Tokens” Vector Representation -> Word, Term, Token or Pairs-Triplets x Doc Matrix Token1 Token2 Token3 Token4 … Doc1 1 2 2 4 Words or Tokens are Doc2 4 2 3 0 attributes and documents Doc3 1 1 1 0 Doc4 1 1 1 2 are examples … Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 40. Text Mining: Transforming Frequencies • Binary Frequencies: tf =1 for tf>0; otherwise 0 • Term Frequencies: tf(i,j)/Sum of tf(i,j) in Doc K • Log Frequencies: 1 + log(tf) for tf>0; otherwise 0 • Normalized Frequencies: Divide each frequency by SQRT of Sum of Squares of the frequencies within the vector (column) • Term Frequency–Inverse Document Frequency – TF * IDF – Inverse Document Frequency: log(N/(1+D)) where N is total number of docs and D is number with term Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 41. Text Mining: Simple Example Listening Post is an art installation by Mark Hansen and Ben Rubin that culls text fragments in real time from thousands of unrestricted Internet chat rooms, bulletin boards and other public forums. 41 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 42. Text Mining: Simple Example 42 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 43. Text Mining: Simple Example sentence imageid Blogs feeling “I feel” posttime “I’m feeling” postdate posturl 15-20K gender Feelings born Per Day country Contains state Every 1 of 5000 city 10 Mins Pre-Determined lat Feelings lon conditions 43 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 44. Text Mining: Simple Example Query API Result <?xml version="1.0" ?> http://api.wefeelfine.org <feelings> :8080/ <feeling imageid="- ShowFeelings? mZmybPrOGTZ+xukpcU7jg" display=xml& feeling="better" sentence="i feel almost 100 better returnfields= aside from that weird sandy feeling in Sentence my throat" &postdate=2010-11-25 posttime="1321633467" postdate=2010-11-25="0" &limit=500 posturl="http://jenngreenleaf.blogspot.com /2011/11/im-coming-down-with-cold-or- am-i.html" gender="0" country="united states" state="maine" city="richmond" lat="44.091522" lon="-69.801787" conditions="4" /> … 44 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 45. Text Mining: Simple Example • i'm done believing you don't know what i'm feeling • i feel so out of place • i'm feeling healthy • i never feel down when i'm with her • i love the feeling • i feel like i've been run over by a truck • i feel so positive today • i feel like a poor man's pin up girl 45 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 46. Text Mining: Simple Example • Input String (128925 chars; 24282 spaces) – "i have found to be helpful especially during those times when i am feeling discouragedni have a 50km commute and just the lack of the sense of freedom that driving brings just leaves me feeling scaredni seem to be feeling better mostly…" • Tokenize (26465 tokens) – ['i', ', 'have', 'found', 'to', 'be', 'helpful', 'especially', 'during', 'those', 'times', 'when', 'i', 'am', 'feeling', 'discouraged', 'i', 'have', 'a', '50km', 'commute', 'and', 'just', 'the', 'lack', 'of', 'the', 'sense', 'of', 'freedom', 'that', 'driving', 'brings', 'just', 'leaves', 'me', 'feeling', 'scared', 'i', 'feel', 'noone', 'know', 'if', 'you', 'were', 'me', 'you', 'will', 'feel', 'the', 'same', 'way‘, …] • Set of Tokens (3045 distinct tokens) – ["'", "'believe", "'d", "'en", "'encoding", "'feedlinks", "'forever", "'gets", "'http", "'ismobile", "'isprivate", "'item", "'languagedirection", "'ll", "'locale", "'ltr", "'m", "'mefaked", "'mobileclass", "'mr", "'no", "'okay", "'on", "'pagetitle", "'pagetype", "'re", "'s", "'t", "'toned", "'url", "'us", "'utf", "'ve", "'yes", '0', '034', '039', '0aeverytime', '0d', '10', '100', '101',…] Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 47. Text Mining: Simple Example Corpus Word Length Sentence Length Lexical Diversity We Feel Fine 4 17 8 Gutenberg Corpus Austen-persuasion.txt 4 23 16 Bible-kjv.txt 4 33 79 Blake-poems.txt 4 18 5 Carroll-alice.txt 4 16 12 Melville-moby.txt 4 24 15 Milton-paradise.txt 4 52 15 Shakespeare-caesar.txt 4 12 8 Shakespeare-hamlet.txt 4 13 7 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 48. Text Mining: Simple Example • Eliminate Stopwords (175 words - 'a', 'about', 'above', 'after', …) – Set of tokens (12827) with stopwords eliminated ['ab', 'abit', 'able', 'abs', 'absolute', 'absolutely', 'absorb', 'abuse', 'accomplished', 'accomplishment', 'achieve', 'achieved', 'across', 'acted', 'action', 'activities', 'activity', 'actually', 'acura', 'add', …] – Content (11896 or 45% of tokens not stopwords – 4053 with tokens starting with apostrophes and #s eliminated ) • Stemming – Stemmed tokens (11896) ['abdomen', 'abdul', 'abil', 'abl', 'abrupt', 'absolut', 'abstract', 'academ', 'accept', 'accid', 'accomplish', 'accur', 'accus', 'accustom', 'achi', 'achiev', 'acknowledg', 'across', 'action', 'activ‘…] – Set of tokens in stemmed content(2283) ['abdomen', 'abdul', 'abil', 'abl', 'abrupt', 'absolut', 'abstract', 'academ', 'accept', 'accid', 'accomplish', 'accur', 'accus', 'accustom', 'achi', 'achiev',…] Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 49. Text Mining: Simple Example Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 50. Text Mining: Simple Example Document-Term Matrix Sum 416 94 90 89 83 80 80 76 76 75 … 16 16 16 16 16 16 16 16 16 Sum WeFeel like know time go think better way get good love … hear didn place almost comfort everyonsinc babi actual 3 comment1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 comment2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 comment3 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 comment4 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 comment5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 comment6 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 comment7 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 7 comment8 2 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 comment9 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 comment10 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 … … 2 comment1490 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 comment1491 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 6 comment1492 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 3 comment1493 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 comment1494 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 comment1495 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 comment1496 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 comment1497 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 comment1498 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 comment1499 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 51. Text Mining: Simple Example Madness Murmerings Montage Mobs Metrics Mounds Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 52. Prediction Collective, macroscopic trends which can be scientifically inferred by harnessing publicly accessible data from the Internet. 52 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 53. Prediction: Characteristics Public Practical Big 53 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 54. Prediction: Sources Easily accessible digital traces: What we surf Whom we “friend” What we say Where we go What we buy How we play 54 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 55. Prediction: Sample Studies 55 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 56. Prediction: Sample Studies Infodemiology Nowcasting Culturomics 56 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 57. Prediction: Infodemiology Information + Epidemiology: Science of distribution and determinants of information in an electronic medium, specifically the Internet, or in a population, with the ultimate aim to inform public health and public policy Coined by Gunther Eysenbach, Univ. of Toronto 57 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 58. Prediction: Infodemiology A Major Application - Practical 58 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 59. Prediction: Infodemiology A Major Application - Practical Vi Regional, Weekly Syndromic Surveillance 59 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 60. Prediction: Infodemiology An Alternative Approach Text Mining of Worldwide Newswires, Web Sites and Various Offline Reports 60 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 61. Prediction: Infodemiology Utilizing Aggregate Search Data Monitoring and analyzing queries from Internet search engines or peoples' status updates on microblogs for syndromic surveillance to predict disease outbreaks 61 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 62. Prediction: Infodemiology Utilizing Aggregate Search Data 62 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 63. Prediction: Infodemiology Utilizing Aggregate Search Data 63 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 64. Prediction: Infodemiology Utilizing Aggregate Search Data Dependent Dependent Traditional, Aggregate Variable at Variable at Publicly Search Time t Time t - n Available Index or (Standard = b0 + b1 (Standard + b2 Explanatory + b3 Social +e Publicly Publicly Variable Media Available Available Freq. Measure) Measure) Count Standard Linear Prediction Model 64 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 65. Prediction: Infodemiology Utilizing Aggregate Search Data “Detecting Influenza Epidemics Using Search Engine Query Data” (Ginsberg et. al.), 2/19/09 • Aggregating historical logs of search queries from 2003-2008, computing weekly time series • Logit(P) = b0 + b1 * logit(Q) + e – P – percentage of ILI physician visits – Q – query fraction 45 highest influenza queries • r is between .80-.96 for 9 regions 65 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 66. Prediction: Infodemiology Utilizing Aggregate Search Data http://www.google.org/flutrends/about/how.html 66 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 67. Prediction: Infodemiology Utilizing Aggregate Search Data 67 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 68. Prediction: Infodemiology A Similar Application http://www.google.org/denguetrends/ 68 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 69. Prediction: Infodemiology Utilizing Tweets ? 69 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 70. Prediction: Infodemiology Utilizing Tweets 70 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 71. Prediction: Infodemiology Utilizing Tweets “Nowcasting Events from the Social Web with Statistical Learning,” Lampos and Cristianini, ACM IS&T, 9/11 • Text analysis of 50M tweets for 3 regions of UK from 6/09-4/10 (303 days) • HPA weekly reports of GP consultations with ILI diagnosis correlated with number of “hybrid grams” • Average “r” of .911 71 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 72. Prediction: Infodemiology A Major Application – Text Analysis 50M Tweets Corpus 3 Region UK, 6/09-4/10 Corpus Lower Stop Tokens Stems Refinement Case Words Feature 1- 2 Hybrid N-Gram Selection Grams Grams Grams Freqs 72 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 73. Prediction: Infodemiology Utilizing Tweets Discarded when n<50 BoLasso - Bootstrap LASSO (least absolute shrinkage and selection operator 73 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 74. Prediction: Infodemiology Utilizing Tweets 74 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 75. Prediction: Infodemiology Utilizing Tweets 75 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 76. Prediction: Now + Forecasting: Predicting the present by analyzing large volumes of data that can be used to "forecast" current events for which official analysis has not been released 76 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 77. Prediction: Nowcasting Weather Envy Within the next 6 hours … 77 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 78. Prediction: Sample Studies with Search Authors Date (Mnth-Year) Dependent Variables Explanatory Variables Model Results Song, Pan, Ng Apr-10 Weekly Hotel Bookings in Indexed Search Volumes from Log of Room Nights for Log of Search Test various statistical models; all gave Charleston, SC Google Trends/Insights Jan Volumes - Charleston, Travel Charleston, reasonable forecasts. Best fit model 2008-Aug 2009 Charleston Hotels, Charleston was Autoregressive Distributed Lag Restaurants, Charleston Tourism (ADLM) with a lag period of 6 weeks. Kholodilin, Apr-10 Year-on-Year Growth Rate 220 Google Trend/Insights Y-o-Y monthly URPC growth rates for 3 Query term principal components Podstawski, of Monthly US Real Search terms related to Priv sets of regressors -- Sentiment outperform standard Sentiment and Sliliverstovs Private Consumption, Consumption reduced to 10 (consumer sentiment and confidence); Financial Indicators. A combination of ALFRED db of Fed Rsrv of principal components for Financial (short term and long term two of the factors work best -- those St. Louis montly periods from Jan 2005 interest rates and S&P 500); Query related to mobility and health care to Dec 2009 (combinations of principal components of consumption. query terms) Choi, Varian Apr-09 US Census Bureau Google Trend/Insight query Google Trend indices for query Simple seasonal AR models and fixed- Advance Monthly Retail indices for categories and subcategories related to (log values) of effects models that includes relevant Sales (general and subcategories related to retail overall monthly retail trade (NAICS Google Trend variables tend to specific) and Travel sales (general and specifix) categories), automotive sales, home outperform models that exclude these (Visitor arrival in Hong and related to Travel sales and travel. variables. In some cases small gains, in Kong) other substantial. McLaren, Q2-11 Official monthly Google Trend/Insight query For unemployment, linear AR model For unemployment forecasts, claimant Shanbhogue unemployment data and indexes for the term "Job with query term, claimant count, and GfK count strongest followed by query term. housing price growth in Seekers Allowance (JSA)" for consumer confid. as exp vars; for housing For housing prices, the query term was the UK from June 2004-Jan unemployment and "Estate price growth with query term, Home much stronger than HBF and RICS data. 2011 Agents" for housing Builders and Royal Instit. of Chartered Surveyors price growth balances as exp vars. 78 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 79. Prediction: Sample Studies with Social Media Authors Date (Mnth-Year) Dependent Explanatory Variables Model Results Variables Asur, Mar-10 Box-office Promotion tweets-retweets for a particular movie, Regression of 1st weekend box Promotional tweets are weakly Huberman revenues for (24) tweet rates for particular movie per hour, ratio of office revenues by promotional correlated 1st weekend revs. Tweet movies positive to negative sentiments for the movie tweets-retweets, by tweet rates rates are very strongly correlated vs. Hollywood Stock Exchange (min .9) and a stronger predictor than prices, and 2nd weekend HSX. Finally, tweet rates are strongly revenues by tweet rates and the correlated with 2nd weekend sentiment ratio. revenues and sentiments improve the forecasts slightly. Gruhl, Guha, Aug-05 Amazon Sales Number of mentions of the book/author in over 300K Cross correlation of time series While sales rank is a poor predictor of Kumar, Novak, Rank for 2340 blogs whose postings that were maintained by IBM's for sales rank and mentions. the change in sales rankings, a prior Tomkins bestselling books WebFountain project (over 200K postings/day) spike in mentions predicts quite well in 4 month period a future spike in sales rank. (Jul 2004-Aug 2004) and spikes in these sales ranks Sadikov, Aug-09 Movie critic Basic features that count movie references in blogs, Linear regression for weekly Minimal correlation between Parameswaran, ranking, user count movie references taking into account ranking rankings and sales data by blog rankings and references and Venetis ranking, 2008 and indegree of the blogs where they appear, references and sentiment. sentiment. Strong correlation gross sales, consider only references made within a time window between references and gross sales weekly box office before or after a movie release date, features that but week with sentiment. Strongest sales (weeks 1-5) consider positive sentiment; and combinations of relationships with timing of these. References based on spinn3r.com blog data references in weeks after release. set 11/07-11/08 79 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 80. Prediction: Any Guesses? 80 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 81. Prediction: Idiom, a Sculpture of 10s of 1000s of Books 81 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 82. Prediction: It comes in many Shapes but not Sizes Omphalos Book Cell Matej Krén Gravity Mixer 82 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 83. Prediction: Culturnomics Culture + Genomics: Application of high- throughput data collection and analysis to the study of human culture. 83 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 84. Prediction: Culturomics “Quantitative Analysis of Culture Using Millions of Digitized Books,” Science, 12/16/10. 84 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 85. Prediction: Culturomics 2.0 http://www.youtube.com/watch?v=61qn7S9NCOs 85 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 86. Prediction: Culturomics 2.0 Culturomics 2.0: Forecasting Large-Scale Human Behavior Using Global News Media Tone in Time and Space, Kalev Leetaru, 9/11 • The tone of real-time consciousness reflected in the media can be used to forecast broad social behavior. • Combined three massive news archives totaling more than 100 million articles worldwide to explore the global consciousness of the news media. • Employs a large shared-memory supercomputer (University of Tennessee SGI Altix supercomputer Nautilus with 1024 processors and 4-TB of memory) • Using the tone and location of the reports, (claims to have) predicted the outcome of the Arab Spring and the location of Bin Laden within radius of 125 miles 86 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 87. Prediction: Culturomics 2.0 Based on Carbon Capture Report 87 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 88. Prediction: Culturomics 2.0 Based on Carbon Capture Report 88 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 89. Prediction: Culturomics 2.0 Features of Stories or Tweets • Tone/Positivity/Negativity. Ratio of + to - tone (- 100 to 100) • Polarity. Emotional charge (0 to 100) • Activity. Intensity of "active language" (0 to 100) • Personalization. Degree to which the writer attempts to bring the reader into the fold (0 to 100) • Questions/Exclamations. Tweet tone indicators of non-word items • Geocoding. Location of story content 89 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 90. Prediction: Culturomics 2.0 Features of Stories or Tweets 100M Articles from the: Sentiment Mining, New York Times (1945-05) Geocoding, Sum. of Wrld Brdcasts (1979-10) Entity Extraction Geocoding Google News articles (2006-11) Nautilus Supercomputer Feature Scores 2.4 Petabyte Network with over 10M entitles 90 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 91. Prediction: Culturomics 2.0 Predicting Unrest 91 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 92. Prediction: Culturomics 2.0 NY Times View of Tone http://contentanalysis.ichass.illinois.edu/Culturomics20/nyt-movie- 1000x1000.gif 92 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL
  • 93. Prediction: Culturomics 2.0 SWB View of Tone http://contentanalysis.ichass.illinois.edu/Culturomics20/swb-movie- 1000x1000.gif 93 Copyright 2011 JDA Software Group, Inc. - CONFIDENTIAL