SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
CIKM 2011 | Invited Talk	





Model-Driven Research in Social Computing	

	





Ed H. Chi	

	

Google
Research	

	

Work done while
at Palo Alto
Research Center
(PARC)	


	

	


        2011-10-27            CIKM 2011 Invited Talk   1
Some	
  Google	
  Social	
  Stats	
  
n    250,000	
  words	
  are	
  written	
  each	
  minute	
  on	
  Blogger	
  -­‐	
  
      that’s	
  360	
  million	
  words	
  a	
  day	
  
n    Every	
  16	
  seconds	
  people	
  view	
  enough	
  photos	
  from	
  
      Picasa	
  Web	
  Albums	
  to	
  cover	
  an	
  entire	
  football	
  field	
  
n    Every	
  8	
  minutes,	
  more	
  photos	
  are	
  viewed	
  on	
  Picasa	
  
      Web	
  Albums	
  than	
  exist	
  in	
  the	
  entire	
  Time-­‐LIFE	
  photo	
  
      collection	
  




 2011-10-27                           CIKM 2011 Invited Talk                              2
YouTube	
  Stats	
  
n    150	
  years	
  of	
  YouTube	
  video	
  are	
  watched	
  everyday	
  on	
  
      Facebook	
  (up	
  2.5x	
  y/y)	
  
n    every	
  minute	
  400+	
  tweets	
  contain	
  YouTube	
  links	
  (up	
  3x	
  
      y/y)	
  [Q1	
  20111]	
  
n    100M+	
  people	
  take	
  a	
  social	
  action	
  with	
  YouTube	
  (likes,	
  
      shares,	
  comments,	
  etc)	
  every	
  week	
  (10/15/10)	
  




 2011-10-27                          CIKM 2011 Invited Talk                                3
Google+	
  Stats	
  
n    40	
  million	
  people	
  joined	
  Google	
  since	
  launch.	
  
n    People	
  are	
  2x-­‐3x	
  times	
  more	
  likely	
  to	
  share	
  content	
  with	
  
      one	
  of	
  their	
  circles	
  than	
  to	
  make	
  a	
  public	
  post.	
  




 2011-10-27                            CIKM 2011 Invited Talk                                 4
Social	
  Stream	
  Research	
  
n    Analytics	
  
       –  Factors	
  impacting	
  retweetability	
  [Suh	
  et	
  al,	
  IEEE	
  Social	
  
          Computing	
  2010]	
  
       –  Location	
  field	
  of	
  user	
  profiles	
  [Hecht	
  et	
  al,	
  CHI	
  2011]	
  
       –  Organic	
  Q&A	
  behaviors	
  [Paul	
  et	
  al,	
  ICWSM’11]	
  
       –  Languages	
  used	
  in	
  Twitter	
  [Hong	
  et	
  al,	
  ICWSM’11]	
  
n    Improving	
  Stream	
  Experience	
  
       –  Topic-­‐based	
  summarization	
  &	
  browsing	
  of	
  tweets	
  [Bernstein	
  et	
  
          al,	
  UIST2010]	
  
       –  Tweet	
  recommendation	
  [Chen	
  et	
  al,	
  CHI2010	
  &	
  CHI2011]	
  



 2011-10-27                                   CIKM 2011 Invited Talk                                5
Invisible	
  Brokerage	
  Signals	
  across	
  
Language	
  Barriers	
  

  Joint	
  work	
  w/	
  Lichan	
  Hong,	
  Gregorio	
  Convertino	
  
  	
  
  [Hong	
  et	
  al.,	
  ICWSM	
  July	
  2011]	
  
  	
  




 2011-10-27                                 CIKM 2011 Invited Talk       6
Motivation	
  for	
  Studying	
  Languages	
  

n    Twitter	
  is	
  an	
  international	
  phenomenon	
  
       –  Most	
  research	
  focused	
  on	
  English	
  users	
  
       –  Question	
  about	
  generalization	
  to	
  non-­‐English	
  
       –  Understand	
  cross-­‐language	
  usage	
  differences	
  
       –  Design	
  implications	
  for	
  international	
  users	
  
n    Research	
  Questions:	
  
       –  What	
  is	
  the	
  language	
  distribution	
  in	
  Twitter?	
  
       –  How	
  do	
  users	
  of	
  different	
  languages	
  use	
  Twitter?	
  
       –  How	
  do	
  bilingual	
  users	
  spread	
  information	
  across	
  languages?	
  


       	
  
       2011-10-27                              CIKM 2011 Invited Talk                            7
Data	
  Collection	
  &	
  Processing	
  
               	
  Twitter	
  stream	
  
                                 04/18/10-­‐05/16/10	
  (4	
  weeks)	
  	
  

                	
  62M	
  tweets	
  

                                 Google	
  Language	
  API	
  &	
  LingPipe	
  

           	
  104	
  languages	
  	
  



               Top	
  10	
  languages	
  

  2011-10-27                               CIKM 2011 Invited Talk                 8
Top	
  10	
  Languages	
  in	
  Twitter	
  
              	
  	
  Language	
     	
  	
  	
  	
  	
  Tweets	
     	
  	
  	
  	
  %	
     	
  	
  	
  	
  	
  Users	
  

              English	
                   31,952,964	
   51.1	
                                    5,282,657	
  
              Japanese	
                    11,975,429	
   19.1	
                                   1,335,074	
  
              Portuguese	
                   5,993,584	
                      9.6	
                    993,083	
  
              Indonesian	
                   3,483,842	
                      5.6	
                       338,116	
  
              Spanish	
                        2,931,025	
                    4.7	
                      706,522	
  
              Dutch	
                             883,942	
                        1.4	
                 247,529	
  
              Korean	
                             754,189	
                        1.2	
                 116,506	
  
              French	
                            603,706	
                        1.0	
                  261,481	
  
              German	
  	
                       588,409	
                         1.0	
                  192,477	
  
              Malay	
                              559,381	
                    0.9	
                     180,147	
  
 2011-10-27                                         CIKM 2011 Invited Talk                                                    9
Human-­‐Coding	
  Study	
  
n    2,000	
  random	
  tweets	
  from	
  62M	
  tweets	
  
n    2	
  human	
  judges	
  for	
  each	
  of	
  top	
  1o	
  languages	
  	
  
       –  native	
  speakers	
  or	
  proficient	
  
       –  discuss	
  to	
  resolve	
  disagreement	
  
n    Hard	
  to	
  find	
  Indonesian	
  &	
  Malay	
  judges	
  
n    Presented	
  2,000	
  tweets	
  to	
  each	
  judge	
  
n    Judge	
  selected	
  tweets	
  in	
  his/her	
  language	
  




 2011-10-27                              CIKM 2011 Invited Talk                     10
Machine	
  vs.	
  Human	
  
   T-­‐P:	
  true	
  positive,	
  T-­‐N:	
  true	
  negative,	
  F-­‐N:	
  false-­‐negative,	
  F-­‐P:	
  false	
  positive	
  
   	
  	
  Language	
        	
  	
  	
  	
  	
  T-­‐P	
   	
  	
  	
  T-­‐N	
         	
  	
  	
  F-­‐N	
   	
  	
  F-­‐P	
     	
  	
  	
  	
  	
  	
  Cohen’s	
  Kappa	
  

   English	
                         974	
                             971	
                     20	
   35	
                                        0.95	
  
   Japanese	
                           370	
                     1,595	
                             0	
           35	
                            0.94	
  
   Portuguese	
                          170	
                    1,803	
                          19	
              8	
                            0.92	
  
   Indonesian	
                         106	
                     1,875	
                          15	
               4	
                           0.91	
  
   Spanish	
                               96	
                  1,889	
                            11	
              4	
                           0.92	
  
   Dutch	
                                   18	
                 1,978	
                             2	
             2	
                           0.90	
  
   Korean	
                                 24	
                  1,976	
                             0	
            0	
                            1.00	
  
   French	
                                   13	
               1,980	
                              0	
             7	
                            0.79	
  
   German	
  	
                               12	
                1,979	
                             2	
             7	
                            0.72	
  
   Malay	
                                      8	
               1,979	
                             4	
            9	
                             0.55	
  

   2011-10-27                                                                      CIKM 2011 Invited Talk                                                                       11
Accuracy	
  of	
  Language	
  Detection	
  


  n    Two	
  Types	
  of	
  Errors	
  
         –  Got	
  ur	
  dirct	
  msg.i’m	
  lukng	
  4wrd	
  2	
  twt	
  wit	
  u	
  
            too.so,wat	
  doing	
  ha…(detected	
  as	
  Afrikaans)	
  
         –  High	
  error	
  rate	
  for	
  tweets	
  of	
  1~2	
  words	
  




   2011-10-27                              CIKM 2011 Invited Talk                        12
Machine	
  vs.	
  Human	
  

  	
  	
  Language	
     	
  	
  	
  	
  	
  T-­‐P	
   	
  	
  	
  T-­‐N	
         	
  	
  	
  F-­‐N	
   	
  	
  F-­‐P	
     	
  	
  	
  	
  	
  	
  Cohen’s	
  Kappa	
  

  French	
                                13	
               1,980	
                             0	
             7	
                             0.79	
  
  German	
  	
                            12	
                1,979	
                            2	
             7	
                             0.72	
  
  Malay	
                                   8	
               1,979	
                            4	
             9	
                             0.55	
  


  •  French:	
  5/7	
  F-­‐P	
  have	
  2	
  words	
  
  •  German:	
  1/2	
  F-­‐N	
  has	
  1	
  word;	
  6/7	
  F-­‐Ps	
  are	
  in	
  English	
  
  •  Malay:	
  3/4	
  F-­‐Ns	
  &	
  7/9	
  F-­‐Ps	
  are	
  in	
  Indonesian	
  


  2011-10-27                                                                   CIKM 2011 Invited Talk                                                                       13
Common	
  Twitter	
  Conventions	
  
                              hashtag	
  



             mention	
                     URL	
  


                reply	
  (per-­‐tweet	
  metadata)	
  



                                            retweet	
  
2011-10-27        CIKM 2011 Invited Talk                  14
Use	
  of	
  URLs	
  in	
  62M	
  Tweets	
  
 	
  	
  Language	
     	
  URLs	
  
                                       n    Chi	
  Square	
  tests	
  confirmed	
  that	
  
 All	
                    21%	
  
                                             differences	
  by	
  language	
  are	
  
 English	
                25%	
  
                                             significant.	
  
 Japanese	
               13%	
  
 Portuguese	
             13%	
  
 Indonesian	
             13%	
  
 Spanish	
                15%	
  
 Dutch	
                  17%	
  
 Korean	
                 17%	
  
 French	
                 37%	
  
 German	
  	
             39%	
  
 Malay	
                  17%	
  

 2011-10-27                              CIKM 2011 Invited Talk                               15
Significant	
  Cross-­‐Language	
  Differences	
  
   	
  	
  Language	
   	
  URLs	
              Hashtags	
   Mentions	
   Replies	
   	
  Retweets	
  
   All	
                        21%	
                 11%	
                 49%	
                31%	
                13%	
  
   English	
                    25%	
                 14%	
                 47%	
                29%	
                13%	
  
   Japanese	
                   13%	
                  5%	
                 43%	
                33%	
                    7%	
  
   Portuguese	
                 13%	
                 12%	
                 50%	
                32%	
                12%	
  
   Indonesian	
                 13%	
                  5%	
                 72%	
                20%	
                39%	
  
   Spanish	
                    15%	
                 11%	
                 58%	
                39%	
                14%	
  
   Dutch	
                      17%	
                 13%	
                 50%	
                35%	
                    11%	
  
   Korean	
                     17%	
                 11%	
                 73%	
                59%	
                    11%	
  
   French	
                     37%	
                 12%	
                 48%	
                36%	
                    9%	
  
   German	
  	
                 39%	
                 18%	
                 36%	
                25%	
                    8%	
  
   Malay	
                      17%	
                  5%	
                 62%	
                23%	
                29%	
  
                    Chi	
  Square	
  tests	
  confirmed	
  that	
  differences	
  by	
  language	
  are	
  significant	
  


 2011-10-27                                        CIKM 2011 Invited Talk                                                           16
Implications	
  
   	
  	
  Language	
      	
  URLs	
   	
  Hashtags	
   	
  Mentions	
   	
  Replies	
   	
  Retweets	
  

   All	
                     21%	
         11%	
              49%	
           31%	
             13%	
  
   Korean	
                  17%	
         11%	
              73%	
           59%	
             11%	
  
   German	
  	
              39%	
         18%	
              36%	
           25%	
              8%	
  


 n     Use	
  of	
  Twitter	
  for	
  social	
  networking	
  vs.	
  information	
  
        sharing	
  different	
  in	
  different	
  languages	
  
 n     Design	
  of	
  recommendation	
  engines	
  
             –  Korean	
  users:	
  promote	
  conversational	
  tweets	
  
             –  German	
  users:	
  promote	
  tweets	
  with	
  URLs	
  



       2011-10-27                                CIKM 2011 Invited Talk                                      17
Studying	
  Bilingual	
  Brokers	
  
n    Importance	
  of	
  brokers	
  
       –  Structural	
  holes	
  (Burt’92),	
  LiveJournal	
  (Herring	
  et	
  al’07)	
  

n    Define	
  bilingual	
  brokers	
  as	
  Users	
  who	
  tweeted	
  in	
  a	
  
      pair	
  of	
  languages	
  
n    Caveat	
  
       –  Under-­‐estimated	
  due	
  to	
  4-­‐week	
  time	
  limit	
  
       –  Over-­‐estimated	
  due	
  to	
  language	
  detection	
  errors	
  



 2011-10-27                              CIKM 2011 Invited Talk                              18
Number	
  of	
  Bilingual	
  Brokers	
  
             E	
         J	
          P	
           I	
         S	
          D	
         K	
        F	
       G	
  
J	
   140,730	
  

P	
   488,545	
   13,228	
  

 I	
      230,023	
     4,825	
     29,405	
  

S	
   359,117	
   10,139	
   112,524	
            36,068	
  

D	
   150,041	
         6,383	
     30,855	
      34,906	
   30,916	
  

K	
        19,722	
     6,384	
         906	
      2,014	
     1,109	
         972	
  

F	
   194,931	
   10,463	
          53,607	
      34,586	
   49,445	
      33,568	
   1,244	
  

	
  
G         110,748	
     6,053	
     22,106	
      21,471	
   21,989	
      22,162	
      786	
   24,763	
  

	
  
M 148,365	
             4,208	
     31,184	
   135,427	
   31,967	
        29,331	
   1,518	
   30,257	
   18,301	
  



          2011-10-27                                CIKM 2011 Invited Talk                                        19
Sharing	
  URLs	
  Across	
  Languages	
  
           E	
           J	
          P	
             I	
                 S	
              D	
             K	
           F	
              G	
             M	
  
E                      3,013	
   18,399	
             985	
              4,986	
          1,144	
          212	
       1,791	
           1,647	
           540	
  

J	
       3,013	
                         77	
            37	
                58	
            29	
           43	
           59	
             46	
            18	
  

P        18,399	
           77	
                          74	
           1,644	
            198	
              2	
       453	
             168	
           123	
  

 I	
         985	
          37	
          74	
                                67	
            64	
             1	
          53	
             38	
          279	
  

S         4,986	
           58	
     1,644	
              67	
                              139	
              0	
       286	
             139	
             53	
  

D         1,144	
           29	
       198	
              64	
             139	
                               2	
        112	
            126	
             48	
  

K            212	
          43	
              2	
             1	
                 0	
              2	
                           3	
              3	
             1	
  

F	
       1,791	
           59	
       453	
              53	
             286	
             112	
             3	
                         157	
             53	
  

G         1,647	
           46	
       168	
              38	
             139	
            126	
              3	
       157	
                               40	
  

M            540	
          18	
       123	
          279	
                   53	
            48	
             1	
          53	
             40	
  


         2011-10-27                                                   CIKM 2011 Invited Talk                                                                  20
Sharing	
  Hashtags	
  Across	
  Languages	
  

           E	
           J	
        P	
            I	
              S	
           D	
         K	
          F	
          G	
          M	
  
	
  
E                      8,178	
   33,197	
        14,96 27,284	
                  6,685	
      798	
      9,410	
       7,208	
      5,517	
  
                                                     9	
  
J	
       8,178	
                    331	
         135	
             351	
         218	
      149	
        352	
         260	
        100	
  

	
  
P        33,197	
        331	
                     535	
           4,682	
         604	
        13	
     1,231	
         580	
        400	
  

 I	
     14,969	
        135	
       535	
                           762	
         684	
        25	
       713	
         415	
      6,046	
  

	
  
S        27,284	
        351	
     4,682	
         762	
                           819	
        28	
     1,468	
         708	
        463	
  

	
  
D         6,685	
        218	
       604	
         684	
             819	
                      26	
       851	
         769	
        424	
  

	
  
K            798	
       149	
          13	
           25	
             28	
         26	
                     25	
         18	
         20	
  

F	
       9,410	
        352	
     1,231	
         713	
           1,468	
         851	
        25	
                     879	
         411	
  

	
  
G         7,208	
        260	
       580	
         415	
             708	
         769	
        18	
       879	
                      265	
  

	
  
M         5,517	
        100	
       400	
       6,046	
             463	
         424	
        20	
        411	
        265	
  


          2011-10-27                                            CIKM 2011 Invited Talk                                                   21
Implications	
  
n    Indicators	
  of	
  connection	
  strength	
  between	
  
      languages	
  
       –  Number	
  of	
  bilingual	
  brokers	
  
       –  Acts	
  of	
  brokerage:	
  sharing	
  URLs	
  &	
  hashtags	
  
n    English	
  well	
  connected	
  to	
  others,	
  and	
  may	
  
      function	
  as	
  a	
  hub	
  
n    Need	
  to	
  improve	
  cross-­‐language	
  
      communications	
  




 2011-10-27                         CIKM 2011 Invited Talk
                                                                  ?          22
Visible	
  Social	
  Signals	
  from	
  	
  
Shared	
  Items	
  

          Kudos	
  to	
  Jilin	
  Chen,	
  Rowan	
  Nairn	
  
   	
     	
  
          [Chen	
  et	
  al,	
  CHI2010]	
  
   	
     [Chen	
  et	
  al.,	
  CHI2011]	
  




   2011-10-27                                   CIKM 2011 Invited Talk   23
Eddi:	
  Summarizing	
  Social	
  Streams	
  




 2011-10-27          CIKM 2011 Invited Talk     24
Information	
  Gathering/Seeking	
  
n    The	
  Filtering	
  Problem:	
  
       –  “I	
  get	
  1,000+	
  items	
  in	
  my	
  stream	
  daily	
  but	
  only	
  have	
  time	
  to	
  
          read	
  10	
  of	
  them.	
  Which	
  ones	
  should	
  I	
  read?”	
  


n    The	
  Discovery	
  Problem:	
  
       –  “There	
  are	
  millions	
  of	
  URLs	
  posted	
  daily	
  on	
  Twitter.	
  Am	
  I	
  
          missing	
  something	
  important	
  there	
  outside	
  my	
  own	
  Twitter	
  
          stream?”	
  




 2011-10-27                                    CIKM 2011 Invited Talk                                            25
Stream	
  Recommender	
  

n    Zerozero88.com	
  
       –  Twitter	
  as	
  the	
  platform	
  
       –  URLs	
  as	
  the	
  medium	
  
       –  Produces	
  your	
  
          personal	
  headlines	
  




       2011-10-27                                CIKM 2011 Invited Talk   26
URL Sources



   Topic Relevance
                                                         User Topic Profiles
        Scores



Social Network Scores                                Local Social Network



Recommendation Engine

Ø Multiply scores
Ø Rank URLs using multiplied scores
Ø Recommend highest ranked URLs
   2011-10-27                   CIKM 2011 Invited Talk                         27
URL	
  Sources	
  
n    Considering	
  all	
  URLs	
  was	
  impossible	
  
n    FoF:	
  URLs	
  from	
  followee-­‐of-­‐followees	
  
       –  Social	
  Local	
  News	
  is	
  Better	
  
n    Popular:	
  URLs	
  that	
  are	
  popular	
  across	
  whole	
  Twitter	
  
       –  Popular	
  News	
  is	
  Better	
  



        Component               Possible Design Choices

        URL Sources FoF (followee-of-followees)
                    Popular




 2011-10-27                                     CIKM 2011 Invited Talk               28
URL Sources



   Topic Relevance
                                                         User Topic Profiles
        Scores



Social Network Scores                                Local Social Network



Recommendation Engine

Ø Multiply scores
Ø Rank URLs using multiplied scores
Ø Recommend highest ranked URLs
   2011-10-27                   CIKM 2011 Invited Talk                         29
Topic	
  Relevance	
  Scores	
  




Funny         YouTube   Video                            Funny   Game   …
1.3           5.5       0.5                              4.0     2.1    …




      2011-10-27                CIKM 2011 Invited Talk                      30
Topic	
  Profile	
  of	
  URLs	
  

n    Built	
  from	
  tweets	
  that	
  contain	
  the	
  URL	
  
n    However,	
  tweets	
  are	
  short	
  	
  
       –  term	
  vectors	
  for	
  URLs	
  are	
  often	
  too	
  sparse	
  
n    Adopt	
  a	
  term	
  expansion	
  technique	
  using	
  a	
  search	
  engine	
  

               Best	
  of	
  Show	
  CES	
  2011:	
  The	
  Motorola	
  Atrix	
  	
  	
  http://tcrn.ch/e0g3Oh 	
  
                                                                                                        Add to
                                                                                                        Profile




                                                                   smartphone,
                                                                   mobility, …

       2011-10-27                                    CIKM 2011 Invited Talk                                           31
Topic	
  Profile	
  of	
  Users	
  
n     Self-­‐Topic:	
  content	
  profile	
  based	
  on	
  my	
  posts	
  
        –  My	
  Interest	
  as	
  Information	
  Producer	
  
n     Followee-­‐Topic:	
  content	
  profile	
  based	
  on	
  my	
  
       followees’	
  posts	
  
        –  My	
  Interest	
  as	
  Information	
  Gatherer	
  
n     None,	
  for	
  comparison	
  purpose	
  

      Component Possible Design Choices

      Topic              Self-Topic
      Relevance          Followee-Topic
      Scores             None


 2011-10-27                                  CIKM 2011 Invited Talk           32
My	
  Followees	
  
                                                 Profile                           Profile
                                                           Profile       Profile
                             Collect &                            Profile
                              Profile
                                                 Profile                           Profile
                                                           Profile       Profile
                                                                  Profile



A term is weighted higher in your profile if                    Find Top
more of your followees have the term as                        Key Terms
their top key terms


                                                 Terms                             Terms
                                                           Terms           Terms
Profile                Aggregate                                   Terms
                                                 Terms                             Terms
                                                           Terms           Terms
                                                                   Terms

2011-10-27                     CIKM 2011 Invited Talk                                   33
URL Sources



   Topic Relevance
                                                         User Topic Profiles
        Scores



Social Network Scores                                Local Social Network



Recommendation Engine

Ø Multiply scores
Ø Rank URLs using multiplied scores
Ø Recommend highest ranked URLs
   2011-10-27                   CIKM 2011 Invited Talk                         34
Social	
  Network	
  Scores	
  
n    “Popular	
  Vote”	
  in	
  among	
  my	
  followees-­‐of-­‐followees	
  
       –  People	
  “vote”	
  a	
  URL	
  by	
  tweeting	
  it	
  
       –  URLs	
  with	
  more	
  votes	
  in	
  total	
  are	
  assigned	
  higher	
  score	
  
       –  Votes	
  are	
  weighted	
  using	
  social	
  network	
  structure	
  


n    None,	
  for	
  comparison	
  purpose	
  

                Component          Possible Design Choices

                Social             Social Voting
                Network            None
                Scores


        2011-10-27                          CIKM 2011 Invited Talk                                 35
The	
  Intuition:	
  Local	
  Influence	
  

                                            follow
                        15 People
              follows

                                                Whose URLs should be
                                                weighted higher?
 Me	
  

              follows
                         5 People          follow



 2011-10-27                CIKM 2011 Invited Talk                      36
Possible	
  Recommender	
  Designs	
  
Component      Possible Design Choices


URL Sources    FoF (followee-of-followees)
               Popular
Topic          Self-Topic
Relevance      Followee-Topic                        Recommendation Engine
Scores         None
Social         Social Voting                         Ø Multiply scores
Network        None                                  Ø Rank URLs using multiplied scores
Scores                                               Ø Recommend highest ranked URLs

   •  2 (URL source) x 3 (topic score) x 2 (social score) = 12
      possible algorithm designs in total"
   •  Random selection if for both scores we chose None"

       2011-10-27                        CIKM 2011 Invited Talk                     37
Study	
  Design	
  
 n    Within-­‐subject	
  design	
  
 n    Each	
  subject	
  evaluated	
  5	
  URL	
  recommendations	
  
       from	
  each	
  of	
  the	
  12	
  algorithms	
  
        –  Show	
  60	
  URLs	
  in	
  random	
  order,	
  and	
  ask	
  for	
  binary	
  rating	
  
        –  60	
  ratings	
  x	
  44	
  subjects	
  =	
  2640	
  ratings	
  in	
  total	
  
Summary	
  of	
  Results	
  

                                                        Popular URLs
                                                         FoF URLs




                                                        Social Vote Only



                                                        Best Performing


    2011-10-27                 CIKM 2011 Invited Talk               39
                                                                    39
Algorithms	
  Differ	
  Not	
  Only	
  in	
  Accuracy!	
  

n    Relevance	
  vs.	
  Serendipity	
  in	
  recommendations	
  
n    From	
  a	
  subject	
  in	
  the	
  pilot	
  interview	
  of	
  zerozero88:	
  
       –  “There	
  is	
  a	
  tension	
  between	
  the	
  discovery	
  and	
  the	
  affirming	
  
          aspect	
  of	
  things.	
  I	
  am	
  getting	
  tweets	
  about	
  things	
  that	
  I	
  am	
  
          already	
  interested	
  in.	
  Something	
  I	
  crave	
  …,	
  is	
  an	
  element	
  of	
  
          surprise	
  or	
  whimsy.	
  ...	
  I	
  am	
  getting	
  a	
  lot	
  of	
  things	
  I	
  am	
  
          interested	
  in,	
  but	
  that	
  is	
  not	
  necessarily	
  a	
  good	
  thing	
  for	
  me	
  
          personally”	
  




 2011-10-27                                    CIKM 2011 Invited Talk                                           40
Design	
  Rule	
  
n    Interaction	
  costs	
  
      determine	
  number	
  of	
  
      people	
  who	
  participate	
  




                                                     # People willing to participate
       –  Surplus	
  of	
  attention	
  &	
  
          motivation	
  at	
  small	
  
          transaction	
  costs	
  
n    Therefore:	
  	
  
n    Important	
  to	
  keep	
  
      interaction	
  costs	
  low	
  
       –  Recommendation	
  
       –  Summarization	
                                                              Cost of participation
n    Or	
  bring	
  new	
  benefits	
  

 2008-05-13                                 CSCL 2011 Keynote
Thank	
  you!	
  
              n    chi@acm.org	
  
              n    http://edchi.net	
  




 2011-10-27                      CIKM 2011 Invited Talk   42

Weitere ähnliche Inhalte

Mehr von Ed Chi

2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...Ed Chi
 
WikiSym 2011 Closing Keynote
WikiSym 2011 Closing KeynoteWikiSym 2011 Closing Keynote
WikiSym 2011 Closing KeynoteEd Chi
 
CSCL 2011 Keynote on Social Computing and eLearning
CSCL 2011 Keynote on Social Computing and eLearningCSCL 2011 Keynote on Social Computing and eLearning
CSCL 2011 Keynote on Social Computing and eLearningEd Chi
 
Replication is more than Duplication: Position slides for CHI2011 panel on re...
Replication is more than Duplication: Position slides for CHI2011 panel on re...Replication is more than Duplication: Position slides for CHI2011 panel on re...
Replication is more than Duplication: Position slides for CHI2011 panel on re...Ed Chi
 
Tutorial on Using Amazon Mechanical Turk (MTurk) for HCI Research
Tutorial on Using Amazon Mechanical Turk (MTurk) for HCI ResearchTutorial on Using Amazon Mechanical Turk (MTurk) for HCI Research
Tutorial on Using Amazon Mechanical Turk (MTurk) for HCI ResearchEd Chi
 
Crowdsourcing for HCI Research with Amazon Mechanical Turk
Crowdsourcing for HCI Research with Amazon Mechanical TurkCrowdsourcing for HCI Research with Amazon Mechanical Turk
Crowdsourcing for HCI Research with Amazon Mechanical TurkEd Chi
 
Eddi: Topic Browsing of Twitter Streams
Eddi: Topic Browsing of Twitter StreamsEddi: Topic Browsing of Twitter Streams
Eddi: Topic Browsing of Twitter StreamsEd Chi
 
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...Ed Chi
 
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...Ed Chi
 
Zerozero88 Twitter URL Item Recommender
Zerozero88 Twitter URL Item RecommenderZerozero88 Twitter URL Item Recommender
Zerozero88 Twitter URL Item RecommenderEd Chi
 
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006Ed Chi
 
Model-Driven Research in Social Computing
Model-Driven Research in Social ComputingModel-Driven Research in Social Computing
Model-Driven Research in Social ComputingEd Chi
 
ASC Disaster Response Proposal from Aug 2007
ASC Disaster Response Proposal from Aug 2007ASC Disaster Response Proposal from Aug 2007
ASC Disaster Response Proposal from Aug 2007Ed Chi
 
Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Ed Chi
 
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...Ed Chi
 
2010-03-10 PARC Augmented Social Cognition Research Overview
2010-03-10 PARC Augmented Social Cognition Research Overview2010-03-10 PARC Augmented Social Cognition Research Overview
2010-03-10 PARC Augmented Social Cognition Research OverviewEd Chi
 
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica SinicaEd Chi
 
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...Ed Chi
 
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...Ed Chi
 
Wikipedia Slowing Growth and Models
Wikipedia Slowing Growth and ModelsWikipedia Slowing Growth and Models
Wikipedia Slowing Growth and ModelsEd Chi
 

Mehr von Ed Chi (20)

2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...
 
WikiSym 2011 Closing Keynote
WikiSym 2011 Closing KeynoteWikiSym 2011 Closing Keynote
WikiSym 2011 Closing Keynote
 
CSCL 2011 Keynote on Social Computing and eLearning
CSCL 2011 Keynote on Social Computing and eLearningCSCL 2011 Keynote on Social Computing and eLearning
CSCL 2011 Keynote on Social Computing and eLearning
 
Replication is more than Duplication: Position slides for CHI2011 panel on re...
Replication is more than Duplication: Position slides for CHI2011 panel on re...Replication is more than Duplication: Position slides for CHI2011 panel on re...
Replication is more than Duplication: Position slides for CHI2011 panel on re...
 
Tutorial on Using Amazon Mechanical Turk (MTurk) for HCI Research
Tutorial on Using Amazon Mechanical Turk (MTurk) for HCI ResearchTutorial on Using Amazon Mechanical Turk (MTurk) for HCI Research
Tutorial on Using Amazon Mechanical Turk (MTurk) for HCI Research
 
Crowdsourcing for HCI Research with Amazon Mechanical Turk
Crowdsourcing for HCI Research with Amazon Mechanical TurkCrowdsourcing for HCI Research with Amazon Mechanical Turk
Crowdsourcing for HCI Research with Amazon Mechanical Turk
 
Eddi: Topic Browsing of Twitter Streams
Eddi: Topic Browsing of Twitter StreamsEddi: Topic Browsing of Twitter Streams
Eddi: Topic Browsing of Twitter Streams
 
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
 
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...
 
Zerozero88 Twitter URL Item Recommender
Zerozero88 Twitter URL Item RecommenderZerozero88 Twitter URL Item Recommender
Zerozero88 Twitter URL Item Recommender
 
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
 
Model-Driven Research in Social Computing
Model-Driven Research in Social ComputingModel-Driven Research in Social Computing
Model-Driven Research in Social Computing
 
ASC Disaster Response Proposal from Aug 2007
ASC Disaster Response Proposal from Aug 2007ASC Disaster Response Proposal from Aug 2007
ASC Disaster Response Proposal from Aug 2007
 
Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0
 
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
 
2010-03-10 PARC Augmented Social Cognition Research Overview
2010-03-10 PARC Augmented Social Cognition Research Overview2010-03-10 PARC Augmented Social Cognition Research Overview
2010-03-10 PARC Augmented Social Cognition Research Overview
 
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
 
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
 
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...
 
Wikipedia Slowing Growth and Models
Wikipedia Slowing Growth and ModelsWikipedia Slowing Growth and Models
Wikipedia Slowing Growth and Models
 

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

CIKM 2011 | Ed Chi on Model-Driven Research in Social Computing

  • 1. CIKM 2011 | Invited Talk Model-Driven Research in Social Computing Ed H. Chi Google Research Work done while at Palo Alto Research Center (PARC) 2011-10-27 CIKM 2011 Invited Talk 1
  • 2. Some  Google  Social  Stats   n  250,000  words  are  written  each  minute  on  Blogger  -­‐   that’s  360  million  words  a  day   n  Every  16  seconds  people  view  enough  photos  from   Picasa  Web  Albums  to  cover  an  entire  football  field   n  Every  8  minutes,  more  photos  are  viewed  on  Picasa   Web  Albums  than  exist  in  the  entire  Time-­‐LIFE  photo   collection   2011-10-27 CIKM 2011 Invited Talk 2
  • 3. YouTube  Stats   n  150  years  of  YouTube  video  are  watched  everyday  on   Facebook  (up  2.5x  y/y)   n  every  minute  400+  tweets  contain  YouTube  links  (up  3x   y/y)  [Q1  20111]   n  100M+  people  take  a  social  action  with  YouTube  (likes,   shares,  comments,  etc)  every  week  (10/15/10)   2011-10-27 CIKM 2011 Invited Talk 3
  • 4. Google+  Stats   n  40  million  people  joined  Google  since  launch.   n  People  are  2x-­‐3x  times  more  likely  to  share  content  with   one  of  their  circles  than  to  make  a  public  post.   2011-10-27 CIKM 2011 Invited Talk 4
  • 5. Social  Stream  Research   n  Analytics   –  Factors  impacting  retweetability  [Suh  et  al,  IEEE  Social   Computing  2010]   –  Location  field  of  user  profiles  [Hecht  et  al,  CHI  2011]   –  Organic  Q&A  behaviors  [Paul  et  al,  ICWSM’11]   –  Languages  used  in  Twitter  [Hong  et  al,  ICWSM’11]   n  Improving  Stream  Experience   –  Topic-­‐based  summarization  &  browsing  of  tweets  [Bernstein  et   al,  UIST2010]   –  Tweet  recommendation  [Chen  et  al,  CHI2010  &  CHI2011]   2011-10-27 CIKM 2011 Invited Talk 5
  • 6. Invisible  Brokerage  Signals  across   Language  Barriers   Joint  work  w/  Lichan  Hong,  Gregorio  Convertino     [Hong  et  al.,  ICWSM  July  2011]     2011-10-27 CIKM 2011 Invited Talk 6
  • 7. Motivation  for  Studying  Languages   n  Twitter  is  an  international  phenomenon   –  Most  research  focused  on  English  users   –  Question  about  generalization  to  non-­‐English   –  Understand  cross-­‐language  usage  differences   –  Design  implications  for  international  users   n  Research  Questions:   –  What  is  the  language  distribution  in  Twitter?   –  How  do  users  of  different  languages  use  Twitter?   –  How  do  bilingual  users  spread  information  across  languages?     2011-10-27 CIKM 2011 Invited Talk 7
  • 8. Data  Collection  &  Processing    Twitter  stream   04/18/10-­‐05/16/10  (4  weeks)      62M  tweets   Google  Language  API  &  LingPipe    104  languages     Top  10  languages   2011-10-27 CIKM 2011 Invited Talk 8
  • 9. Top  10  Languages  in  Twitter      Language            Tweets          %            Users   English   31,952,964   51.1   5,282,657   Japanese   11,975,429   19.1   1,335,074   Portuguese   5,993,584   9.6   993,083   Indonesian   3,483,842   5.6   338,116   Spanish   2,931,025   4.7   706,522   Dutch   883,942   1.4   247,529   Korean   754,189   1.2   116,506   French   603,706   1.0   261,481   German     588,409   1.0   192,477   Malay   559,381   0.9   180,147   2011-10-27 CIKM 2011 Invited Talk 9
  • 10. Human-­‐Coding  Study   n  2,000  random  tweets  from  62M  tweets   n  2  human  judges  for  each  of  top  1o  languages     –  native  speakers  or  proficient   –  discuss  to  resolve  disagreement   n  Hard  to  find  Indonesian  &  Malay  judges   n  Presented  2,000  tweets  to  each  judge   n  Judge  selected  tweets  in  his/her  language   2011-10-27 CIKM 2011 Invited Talk 10
  • 11. Machine  vs.  Human   T-­‐P:  true  positive,  T-­‐N:  true  negative,  F-­‐N:  false-­‐negative,  F-­‐P:  false  positive      Language            T-­‐P        T-­‐N        F-­‐N      F-­‐P              Cohen’s  Kappa   English   974   971   20   35   0.95   Japanese   370   1,595   0   35   0.94   Portuguese   170   1,803   19   8   0.92   Indonesian   106   1,875   15   4   0.91   Spanish   96   1,889   11   4   0.92   Dutch   18   1,978   2   2   0.90   Korean   24   1,976   0   0   1.00   French   13   1,980   0   7   0.79   German     12   1,979   2   7   0.72   Malay   8   1,979   4   9   0.55   2011-10-27 CIKM 2011 Invited Talk 11
  • 12. Accuracy  of  Language  Detection   n  Two  Types  of  Errors   –  Got  ur  dirct  msg.i’m  lukng  4wrd  2  twt  wit  u   too.so,wat  doing  ha…(detected  as  Afrikaans)   –  High  error  rate  for  tweets  of  1~2  words   2011-10-27 CIKM 2011 Invited Talk 12
  • 13. Machine  vs.  Human      Language            T-­‐P        T-­‐N        F-­‐N      F-­‐P              Cohen’s  Kappa   French   13   1,980   0   7   0.79   German     12   1,979   2   7   0.72   Malay   8   1,979   4   9   0.55   •  French:  5/7  F-­‐P  have  2  words   •  German:  1/2  F-­‐N  has  1  word;  6/7  F-­‐Ps  are  in  English   •  Malay:  3/4  F-­‐Ns  &  7/9  F-­‐Ps  are  in  Indonesian   2011-10-27 CIKM 2011 Invited Talk 13
  • 14. Common  Twitter  Conventions   hashtag   mention   URL   reply  (per-­‐tweet  metadata)   retweet   2011-10-27 CIKM 2011 Invited Talk 14
  • 15. Use  of  URLs  in  62M  Tweets      Language    URLs   n  Chi  Square  tests  confirmed  that   All   21%   differences  by  language  are   English   25%   significant.   Japanese   13%   Portuguese   13%   Indonesian   13%   Spanish   15%   Dutch   17%   Korean   17%   French   37%   German     39%   Malay   17%   2011-10-27 CIKM 2011 Invited Talk 15
  • 16. Significant  Cross-­‐Language  Differences      Language    URLs   Hashtags   Mentions   Replies    Retweets   All   21%   11%   49%   31%   13%   English   25%   14%   47%   29%   13%   Japanese   13%   5%   43%   33%   7%   Portuguese   13%   12%   50%   32%   12%   Indonesian   13%   5%   72%   20%   39%   Spanish   15%   11%   58%   39%   14%   Dutch   17%   13%   50%   35%   11%   Korean   17%   11%   73%   59%   11%   French   37%   12%   48%   36%   9%   German     39%   18%   36%   25%   8%   Malay   17%   5%   62%   23%   29%   Chi  Square  tests  confirmed  that  differences  by  language  are  significant   2011-10-27 CIKM 2011 Invited Talk 16
  • 17. Implications      Language    URLs    Hashtags    Mentions    Replies    Retweets   All   21%   11%   49%   31%   13%   Korean   17%   11%   73%   59%   11%   German     39%   18%   36%   25%   8%   n  Use  of  Twitter  for  social  networking  vs.  information   sharing  different  in  different  languages   n  Design  of  recommendation  engines   –  Korean  users:  promote  conversational  tweets   –  German  users:  promote  tweets  with  URLs   2011-10-27 CIKM 2011 Invited Talk 17
  • 18. Studying  Bilingual  Brokers   n  Importance  of  brokers   –  Structural  holes  (Burt’92),  LiveJournal  (Herring  et  al’07)   n  Define  bilingual  brokers  as  Users  who  tweeted  in  a   pair  of  languages   n  Caveat   –  Under-­‐estimated  due  to  4-­‐week  time  limit   –  Over-­‐estimated  due  to  language  detection  errors   2011-10-27 CIKM 2011 Invited Talk 18
  • 19. Number  of  Bilingual  Brokers   E   J   P   I   S   D   K   F   G   J   140,730   P   488,545   13,228   I   230,023   4,825   29,405   S   359,117   10,139   112,524   36,068   D   150,041   6,383   30,855   34,906   30,916   K   19,722   6,384   906   2,014   1,109   972   F   194,931   10,463   53,607   34,586   49,445   33,568   1,244     G 110,748   6,053   22,106   21,471   21,989   22,162   786   24,763     M 148,365   4,208   31,184   135,427   31,967   29,331   1,518   30,257   18,301   2011-10-27 CIKM 2011 Invited Talk 19
  • 20. Sharing  URLs  Across  Languages   E   J   P   I   S   D   K   F   G   M   E 3,013   18,399   985   4,986   1,144   212   1,791   1,647   540   J   3,013   77   37   58   29   43   59   46   18   P 18,399   77   74   1,644   198   2   453   168   123   I   985   37   74   67   64   1   53   38   279   S 4,986   58   1,644   67   139   0   286   139   53   D 1,144   29   198   64   139   2   112   126   48   K 212   43   2   1   0   2   3   3   1   F   1,791   59   453   53   286   112   3   157   53   G 1,647   46   168   38   139   126   3   157   40   M 540   18   123   279   53   48   1   53   40   2011-10-27 CIKM 2011 Invited Talk 20
  • 21. Sharing  Hashtags  Across  Languages   E   J   P   I   S   D   K   F   G   M     E 8,178   33,197   14,96 27,284   6,685   798   9,410   7,208   5,517   9   J   8,178   331   135   351   218   149   352   260   100     P 33,197   331   535   4,682   604   13   1,231   580   400   I   14,969   135   535   762   684   25   713   415   6,046     S 27,284   351   4,682   762   819   28   1,468   708   463     D 6,685   218   604   684   819   26   851   769   424     K 798   149   13   25   28   26   25   18   20   F   9,410   352   1,231   713   1,468   851   25   879   411     G 7,208   260   580   415   708   769   18   879   265     M 5,517   100   400   6,046   463   424   20   411   265   2011-10-27 CIKM 2011 Invited Talk 21
  • 22. Implications   n  Indicators  of  connection  strength  between   languages   –  Number  of  bilingual  brokers   –  Acts  of  brokerage:  sharing  URLs  &  hashtags   n  English  well  connected  to  others,  and  may   function  as  a  hub   n  Need  to  improve  cross-­‐language   communications   2011-10-27 CIKM 2011 Invited Talk ? 22
  • 23. Visible  Social  Signals  from     Shared  Items   Kudos  to  Jilin  Chen,  Rowan  Nairn       [Chen  et  al,  CHI2010]     [Chen  et  al.,  CHI2011]   2011-10-27 CIKM 2011 Invited Talk 23
  • 24. Eddi:  Summarizing  Social  Streams   2011-10-27 CIKM 2011 Invited Talk 24
  • 25. Information  Gathering/Seeking   n  The  Filtering  Problem:   –  “I  get  1,000+  items  in  my  stream  daily  but  only  have  time  to   read  10  of  them.  Which  ones  should  I  read?”   n  The  Discovery  Problem:   –  “There  are  millions  of  URLs  posted  daily  on  Twitter.  Am  I   missing  something  important  there  outside  my  own  Twitter   stream?”   2011-10-27 CIKM 2011 Invited Talk 25
  • 26. Stream  Recommender   n  Zerozero88.com   –  Twitter  as  the  platform   –  URLs  as  the  medium   –  Produces  your   personal  headlines   2011-10-27 CIKM 2011 Invited Talk 26
  • 27. URL Sources Topic Relevance User Topic Profiles Scores Social Network Scores Local Social Network Recommendation Engine Ø Multiply scores Ø Rank URLs using multiplied scores Ø Recommend highest ranked URLs 2011-10-27 CIKM 2011 Invited Talk 27
  • 28. URL  Sources   n  Considering  all  URLs  was  impossible   n  FoF:  URLs  from  followee-­‐of-­‐followees   –  Social  Local  News  is  Better   n  Popular:  URLs  that  are  popular  across  whole  Twitter   –  Popular  News  is  Better   Component Possible Design Choices URL Sources FoF (followee-of-followees) Popular 2011-10-27 CIKM 2011 Invited Talk 28
  • 29. URL Sources Topic Relevance User Topic Profiles Scores Social Network Scores Local Social Network Recommendation Engine Ø Multiply scores Ø Rank URLs using multiplied scores Ø Recommend highest ranked URLs 2011-10-27 CIKM 2011 Invited Talk 29
  • 30. Topic  Relevance  Scores   Funny YouTube Video Funny Game … 1.3 5.5 0.5 4.0 2.1 … 2011-10-27 CIKM 2011 Invited Talk 30
  • 31. Topic  Profile  of  URLs   n  Built  from  tweets  that  contain  the  URL   n  However,  tweets  are  short     –  term  vectors  for  URLs  are  often  too  sparse   n  Adopt  a  term  expansion  technique  using  a  search  engine   Best  of  Show  CES  2011:  The  Motorola  Atrix      http://tcrn.ch/e0g3Oh   Add to Profile smartphone, mobility, … 2011-10-27 CIKM 2011 Invited Talk 31
  • 32. Topic  Profile  of  Users   n  Self-­‐Topic:  content  profile  based  on  my  posts   –  My  Interest  as  Information  Producer   n  Followee-­‐Topic:  content  profile  based  on  my   followees’  posts   –  My  Interest  as  Information  Gatherer   n  None,  for  comparison  purpose   Component Possible Design Choices Topic Self-Topic Relevance Followee-Topic Scores None 2011-10-27 CIKM 2011 Invited Talk 32
  • 33. My  Followees   Profile Profile Profile Profile Collect & Profile Profile Profile Profile Profile Profile Profile A term is weighted higher in your profile if Find Top more of your followees have the term as Key Terms their top key terms Terms Terms Terms Terms Profile Aggregate Terms Terms Terms Terms Terms Terms 2011-10-27 CIKM 2011 Invited Talk 33
  • 34. URL Sources Topic Relevance User Topic Profiles Scores Social Network Scores Local Social Network Recommendation Engine Ø Multiply scores Ø Rank URLs using multiplied scores Ø Recommend highest ranked URLs 2011-10-27 CIKM 2011 Invited Talk 34
  • 35. Social  Network  Scores   n  “Popular  Vote”  in  among  my  followees-­‐of-­‐followees   –  People  “vote”  a  URL  by  tweeting  it   –  URLs  with  more  votes  in  total  are  assigned  higher  score   –  Votes  are  weighted  using  social  network  structure   n  None,  for  comparison  purpose   Component Possible Design Choices Social Social Voting Network None Scores 2011-10-27 CIKM 2011 Invited Talk 35
  • 36. The  Intuition:  Local  Influence   follow 15 People follows Whose URLs should be weighted higher? Me   follows 5 People follow 2011-10-27 CIKM 2011 Invited Talk 36
  • 37. Possible  Recommender  Designs   Component Possible Design Choices URL Sources FoF (followee-of-followees) Popular Topic Self-Topic Relevance Followee-Topic Recommendation Engine Scores None Social Social Voting Ø Multiply scores Network None Ø Rank URLs using multiplied scores Scores Ø Recommend highest ranked URLs •  2 (URL source) x 3 (topic score) x 2 (social score) = 12 possible algorithm designs in total" •  Random selection if for both scores we chose None" 2011-10-27 CIKM 2011 Invited Talk 37
  • 38. Study  Design   n  Within-­‐subject  design   n  Each  subject  evaluated  5  URL  recommendations   from  each  of  the  12  algorithms   –  Show  60  URLs  in  random  order,  and  ask  for  binary  rating   –  60  ratings  x  44  subjects  =  2640  ratings  in  total  
  • 39. Summary  of  Results   Popular URLs FoF URLs Social Vote Only Best Performing 2011-10-27 CIKM 2011 Invited Talk 39 39
  • 40. Algorithms  Differ  Not  Only  in  Accuracy!   n  Relevance  vs.  Serendipity  in  recommendations   n  From  a  subject  in  the  pilot  interview  of  zerozero88:   –  “There  is  a  tension  between  the  discovery  and  the  affirming   aspect  of  things.  I  am  getting  tweets  about  things  that  I  am   already  interested  in.  Something  I  crave  …,  is  an  element  of   surprise  or  whimsy.  ...  I  am  getting  a  lot  of  things  I  am   interested  in,  but  that  is  not  necessarily  a  good  thing  for  me   personally”   2011-10-27 CIKM 2011 Invited Talk 40
  • 41. Design  Rule   n  Interaction  costs   determine  number  of   people  who  participate   # People willing to participate –  Surplus  of  attention  &   motivation  at  small   transaction  costs   n  Therefore:     n  Important  to  keep   interaction  costs  low   –  Recommendation   –  Summarization   Cost of participation n  Or  bring  new  benefits   2008-05-13 CSCL 2011 Keynote
  • 42. Thank  you!   n  chi@acm.org   n  http://edchi.net   2011-10-27 CIKM 2011 Invited Talk 42