SlideShare a Scribd company logo
1 of 64
Download to read offline
Live Social Semantics
& online community monitoring

Harith Alani
Knowledge Media institute,
The Open University, UK


          http://twitter.com/halani
          http://delicious.com/halani
          http://www.linkedin.com/pub/harith-alani/9/739/534



                                               Semantic Web Summer School
                                                  Cercedilla, Spain,, 2011   1
Market value of Web Analytics




                                2
Location, Sensors, & Social Networking

  Tag-Along Marketing
  The New York Times,
  November 6, 2010




                “Everything is in place for location-based social
                networking to be the next big thing. Tech
                companies are building the platforms, venture
                capitalists are providing the cash and marketers
                are eager to develop advertising. “



                                                                    3
Location, Sensors, & Social Networking




                        The Canine Twitterer




                          “Having my daily
                          workout. Already did
                          15 leg lifts!”




                                                 4
Monitoring online/offline social activity
              Where	
  is	
  everybody?	
  




                                              5
Monitoring online/offline social activity


•  Generating
   opportunities for
   F2F networking




                                            6
Tracking of F2F contact networks
                            Sociometer, MIT, 2002
                            -    F2F and productivity
                            -    F2F dynamics
                            -    Who are key players?
                            -    F2F and office distance




   TraceEncounters - 2004




                                                           7
SocioPatterns platform




                 http://www.sociopatterns.org/!   8
Convergence with online social networks




                                          9
Online vs. offline social networking

•  Digital social networking                          •  Digital networking increase
   increases physical social                             social interaction
   isolation                                              –  Create more opportunities to
                                                             network
•  Causes                                                 –  Supports and increases F2F
    –  Genetic alterations                                   contact!
    –  Weakened immune system                             –  Stronger offline social tiesà
    –  Less resistant to cancer                              more online communication
    –  Higher risk of heart disease                       –  Stronger offline social ties à
    –  Higher blood pressure                                 more diverse online
    –  Faster dementia                                       communications
    –  Narrower arteries                                  –  F2F is medium of choice in
                                                             weaker social ties
                                                      Barry Wellman, The Glocal Village: Internet and
Aric Sigman, “Well Connected? The Biological          Community, Idea’s - The Arts & Science Review,
Implications of 'Social Networking’”, Biologist, 56   University of Toronto, 1(1),2004
(1), 2009
                                                                                                    10
Offline + online social networking
                                Who should
                   Anyone I     I talk to?   Where have I
                   know here?                met this guy?
    Where
    should I go?




   ESWC2010                                                  11
Live Social Semantics (LSS):
     RFIDs + Social Web + Semantic Web
                                    <?xml version="1.0"?>!
                                    <rdf:RDF!
                                        xmlns="http://
                                    tagora.ecs.soton.ac.uk/schemas/
                                    tagging#"!
                                        xmlns:rdf="http://www.w3.org/
                                    1999/02/22-rdf-syntax-ns#"!
                                        xmlns:xsd="http://www.w3.org/2001/
                                    XMLSchema#"!
                                        xmlns:rdfs="http://www.w3.org/
                                    2000/01/rdf-schema#"!
                                        xmlns:owl="http://www.w3.org/
                                    2002/07/owl#"!
                                      xml:base="http://
                                    tagora.ecs.soton.ac.uk/schemas/
                                    tagging">!
                                      <owl:Ontology rdf:about=""/>!
                                      <owl:Class rdf:ID="Post"/>!
                                      <owl:Class rdf:ID="TagInfo"/>!
                                      <owl:Class
                                    rdf:ID="GlobalCooccurrenceInfo"/>!
                                      <owl:Class
                                    rdf:ID="DomainCooccurrenceInfo"/>!
                                      <owl:Class rdf:ID="UserTag"/>!
                                      <owl:Class
                                    rdf:ID="UserCooccurrenceInfo"/>!
                                      <owl:Class rdf:ID="Resource"/>!
                                      <owl:Class rdf:ID="GlobalTag"/>!
                                      <owl:Class rdf:ID="Tagger"/>!
                                      <owl:Class rdf:ID="DomainTag"/>!
                                      <owl:ObjectProperty
                                    rdf:ID="hasPostTag">!
                                        <rdfs:domain
                                    rdf:resource="#TagInfo"/>!
                                      </owl:ObjectProperty>!
                                      <owl:ObjectProperty
                                    rdf:ID="hasDomainTag">!
                                        <rdfs:domain
                                    rdf:resource="#UserTag"/>!
                                      </owl:ObjectProperty>!
                                      <owl:ObjectProperty
                                    rdf:ID="isFilteredTo">!

•    Integration of physical presence and online information
                                        <rdfs:range
                                    rdf:resource="#GlobalTag"/>!
                                        <rdfs:domain

•    Semantic user profile generation
                                    rdf:resource="#GlobalTag"/>!
                                      </owl:ObjectProperty>!
                                      <owl:ObjectProperty

•    Logging of face-to-face contactrdf:ID="hasResource">!
                                        <rdfs:domain rdf:resource="#Post"/>!
                                        <rdfs:range =…!

•    Social network browsing
•    Analysis of online vs offline social networks
m         Live Social Semantics: architecture Communities of Practice
         Communities of Practice
                                                          dbtune.org                               rkbexplorer.com

         Publications              Profile Builder         dbpedia.org                                                            Publications                      Profile
org                                                                                                semanticweb.org
                                                             ontology




                                                                        Web-based Systems
                                                                                            Profile          interests
         data.semanticweb.org                            TAGora Sense
                                                                                            builderDelicious
         rkbexplorer.com                                   Repository
Extractor                                                                                                                Extractor
                    publications, co-authorship networks                                            Flickr
Daemon         Social Tagging                         mbid -> dbpedia uri                                                Daemon         Social Tagging
               Social Networks                            tag -> dbpedia uri                                                            Social Networks
                                                                                                   LastFM

Connect API                       JXT Triple Store                                                Facebook           Connect API                                JXT Trip
                  Contacts                                                                    social semantics                            Contacts
                                             URIs
            Tag disambiguation
                        Social                             triple store                                                                                   Social
                  service                                                                                                                                 Semantics




                                                                                                                                              RDF cache
                                                                                                                 Aggregator
                        Semantics
                      RDF cache




                                                                                                 contacts data
                                                                                                RFID
         Local                                                                                                                   Local
                                                                                                Readers


                                                                        Real World
         Server                                                                                                                  Server                   Real-World
             Tag to URI Real-World
                                              networks




                        service
                        Contact Data                                                                                                                      Contact Data
                      tags




                                                                                                RFID
                                                                                                Badges


      Visualization               Web Interface           Linked Data                                                         Visualization                        Web In
                                                           Linked data                          Web interface                 Visualization

                                                                                                                                                              13
http://data.semanticweb.org/!


SW resources
www.rkbexplorer.com/!




                                                     conference



                                   chair                     proceedings




                                           chair
                                                            author

                                                   CoP




                                                                     14
Social and information networks




                                  15
Merging social networks




                  FOAF    16
Distinct, Separated Identity Management

                                             Harith	
  	
         http://tagora.ecs.soton.ac.uk/
                                              Alani	
         LiveSocialSemantics/eswc2009/foaf/2
                                                   	
  

Delicious	
  Tagging	
  and	
  Network	
                                  RFID	
  Contact	
  Data	
  
     http://tagora.ecs.soton.ac.uk/                                             http://tagora.ecs.soton.ac.uk/
             delicious/halani                                               LiveSocialSemantics/eswc2009/1139

Flickr	
  Tagging	
  and	
  Contacts	
                                    Conference	
  Publica>on	
  Data	
  
 http://tagora.ecs.soton.ac.uk/flickr/                                       http://data.semanticweb.org/person/
           69749885@N00                                                                   harith-alani/

Las:m	
  favourite	
  ar>sts	
  and	
  friends	
                          Past	
  Publica>ons,	
  Projects,	
  Communi>es	
  of	
  Prac>ce	
  
     http://tagora.ecs.soton.ac.uk/                                        http://southampton.rkbexplorer.com/id/
              lastfm/halani                                                             person-05877

Facebook	
  contacts	
  
     http://tagora.ecs.soton.ac.uk/
         facebook/568493878
Tag Filtering Service




                        Semantic modeling
                        Semantic analysis
                        Collective intelligence
                        Statistical analysis
                        Syntactical analysis
                                                  18
Tag Filtering Service




                        19
From Tags to Semantics




                         20
Tags to User Interests




                         21
From raw tags and social relations
to Structured Data



                       Collective
                       intelligence


           User raw                   Semantic
           data                       data




                                                 Structured
                                                 data
                       ontologies




                                                       22
RFIDs for tracking social contact




                                    23
People contact à RFID à RDF Triples



                                                   foaf#Person1
                             contactWith	
  


  Place

                                                      hasContact	
  
                                                                       foaf#Person2
          contactPlace	
           F2FContact



                 contactDate	
                   contactDura>on	
  



           XMLSchema#date	
  
                                               XMLSchema#>me	
                        24
25
Real-time F2F networks with SNS links




                                           26
            http://www.vimeo.com/6590604
Live Social Semantics
 Deployed at:




Data analysis
•  Face-to-face interactions across scientific conferences
•  Networking behaviour of frequent users
•  Correlations between scientific seniority and social networking
•  Comparison of F2F contact network with Twitter and Facebook
•  Social networking with online and offline friends
                                                                     27
Characteristics of F2F contact network
  Network              ESWC 2009        HT 2009         ESWC 2010
  characteristics
  Number of users          175             113              158
  Average degree           54               39               55
  Avg. strength (mn)       143             123              130
  Avg. weight (mn)         2.65            3.15             2.35


  Weights ≤ 1 mn           70%             67%              74%


  Weights ≤ 5 mn           90%             89%              93%


  Weights ≤ 10 mn          95%             94%              96%

•  Degree is number of people with whom the person had at least one F2F
   contact
•  Strength is the time spent in a F2F contact
•  Edge weight is total time spent by a pair of users in F2F contact
                                                                          28
Characteristics of F2F contact events
 Contact              ESWC 2009           HT 2009          ESWC 2010
 characteristics
 Number of                16258             9875               14671
 contact events
 Average contact           46                 42                 42
 length (s)

 Contacts ≤ 1mn           87%                89%                88%

 Contacts ≤ 2mn           94%                96%                95%

 Contacts ≤ 5mn           99%                99%                99%

 Contacts ≤ 10mn          99.8%             99.8%              99.8%


      F2F contact pattern is very similar for all three conferences
F2F contacts of returning users
                                                            Degree
•  Degree: number of other                       10
                                                      2

   participants with whom an attendee
   has interacted
                                                      1
                                                     10 1                                              2
                                                       10                                     10
•  Total time: total time spent in




                                          ESWC2010
                                                            Total interaction time
   interaction by an attendee                         4
                                                 10

                                                      3
                                                 10 3                                 4                          5
                                                   10                                10                         10
•  Link weight: total time spent in F2F               4     Links’ weights
                                                 10
   interaction by a pair of returning               3
                                                 10
   attendees in 2010, versus the same              2
                                                 10
   quantity measured in 2009                        1
                                                 10 1                   2                 3        4             5
                                                   10                 10             10       10                10
 ESWC 2009 &        Pearson Correlation                                        ESWC2009
 ESWC 2010
 Degree                      0.37                     Time spent on F2F networking by frequent
                                                      users is stable, even when the list of
 Total F2F                   0.76
 interaction time                                     people they networked with changed
 Link weight                 0.75
                                                                                                           30
Average seniority of neighbours in F2F networks

•    No clear pattern is observed                                     5
     if the unweighted average                                             senn
                                                                           Avg seniority of the neighbours
     over all neighbours in the




                                     Average seniority of neighbors
                                                                           senn,w
                                                                           with weighted averages
     aggregated network is                                            4
     considered
                                                                           senn,max
                                                                           Seniority of user with strongest link



•    A correlation is observed                                        3
     when each neighbour is
     weighted by the time spent
     with the main person
                                                                      2
•    The correlation becomes
     much stronger when                                               1
     considering for each
     individual only the neighbour
     with whom the most time was
     spent                                                            0
                                                                       0                          5                     10
                                                                                         seniority (number of papers)


            Conference attendees tend to networks with others of similar
            levels of scientific seniority
                                                                                                                             31
Presence	
  of	
  AJendees	
  HT2009	
  
Offline networking vs online networking
                                                                 Twitterers                Spearman
                                                                                           Correlation (ρ)
                                                                 Tweets – F2F Degree           - 0.15

                                                                 Tweets – F2F Strength         - 0.15

                                                                 Twitter Following – F2F       - 0.21
                                                                 Degree




                                                                            users

                    Users with Facebook and Twitter accounts in ESWC 2010

  •    people who have a large number of friends on Twitter and/or Facebook don’t seem to
       be the most socially active in the offline world in comparison to other SNS users

             No strong correlation between amount of F2F
             contact activity and size of online social networks                                     33
Scientific seniority vs Twitter followers
                                                          Twitter users                          Correlation
                                                          H-index – Twitter Followers               0.32
      (#$"


                                                          H-index – Tweets                         - 0.13
        ("




      !#'"




                                                                             *+,-./"01221+./3"
      !#&"
                                                                             45678.9"
                                                                             *+..:3"


      !#%"




      !#$"




        !"
             ("   &"   (("    (&"    $("    $&"    )("    )&"    %("      users


 •    Comparison between people’s scientific seniority and the number of people following
      them on Twitter

 People who have the highest number of Twitter followers are not
 necessarily the most scientifically senior, although they do have high
 visibility and experience                                                                                  34
Conference Chairs
                                    all     chairs    all     chairs
                               participants 2009 participants 2010
                                  2009              2010
average degree                       55            77.7            54           77.6
average strength                    8590          19590           7807         22520
average weight                       159            500           141          674
average number of                    3.44            8            3.37         12
events per edge

   •  Conf chairs interact with more distinct people (larger average degree)

   •  Conf chairs spend more time in F2F interaction (almost three times as much
      as a random participant)
Networking with online and offline ‘friends’
Characteristics             all users       coauthors        Facebook         Twitter
                                                              friends        followers
average contact                 42               75               63              72
duration (s)
average edge weight            141              4470             830            1010
(s)
average number of              3.37              60               13              14
events per edge
   •  Individuals sharing an online or professional social link meet much more
      often than other individuals
   •  Average number of encounters, and total time spent in interaction, is highest
      for co-authors

  F2F contacts with Facebook & Twitter friends were respectively %50 and
  %71 longer, and %286 and %315 more frequent than with others

  They spent %79 more time in F2F contacts with their co-authors, and they
  met them %1680 more times than they met non co-authors
Twitterers vs Non-Twitterers


•  Time spent in conference rooms
  –  Twitter users spent on average 11.4% more time in the
     conf rooms than non-twitter users (mean is 26% higher)


•  Number of people met F2F during the conference
  –  Twitter users met on average 9% more people F2F
     (mean 8% higher)


•  Duration of F2F contacts
  –  Twitter users spent on average 63% more time in F2F
     contact than non twitter users (mean is 20% higher)


                                                              37
Behaviour of individuals – micro level analysis
(#$"


 6DD1">?@20AB?M"                                                                                                     89O1209>M"PQM"12R2<DE27>#"
;01">D?@;<">@60;<>""                            @0"K88"92;L"                                                       S:DT>"9:2"0239">9;7"72>2;7?:27N"


  ("




!#'"




!#&"



                                  :2;<9:=">?@20AB?"C"
                                  >D?@;<"E7DB<2>#"F72G"
                                      ?:;@7>HIJ>"
!#%"




!#$"



                            DO9>@127M"
                              :@6:"                           >:="
                             E7DB<2"                       >?@20A>9N"
  !"
       ("              )"            *"              (+"                (,"   $("           $)"            $*"              ++"             +,"       %("        %)"
                                                                                -./0123"   4$4"526722"   4$4"8972069:"
                                                                                                                                                            38
Behaviour analysis




    Jeffrey Chan, Conor Hayes, and Elizabeth Daly. Decomposing discussion forums using
    common user roles. In Proc. Web Science Conf. (WebSci10), Raleigh, NC: US, 2010
Role Skeleton
Encoding Rules in Ontologies with SPIN
Approach for inferring User Roles
Structural, social network,               Feature levels change with the
reciprocity, persistence, participation   dynamics of the community




Run our rules over each user’s features   Associate Roles with a collection of
and derive the role composition           feature-to-level Mappings
                                          e.g. in-degree -> high, out-degree ->
                                          high


                                                                                  42
Data from Boards.ie
•  Forum 246 (Commuting and Transport): Demonstrates a clear increase in
   activity over time.
•  Forum 388 (Rugby): Exhibits periodic increase and decrease in activity and
   hence it provides good examples of healthy/unhealthy evolutions.
•  Forum 411 (Mobile Phones and PDAs): Increase in activity over time with
   some fluctuation - i.e. reduction and increase over various time windows.
•  For the time in 2004-01 to 2006-12
Results
Commuting and Transport           Rugby                Mobile Phones and PDAs




•  Correlation of individual features in each of the three forums
(a) Forum 246: Commuting and Transport


                                         Results




                                                                         (b) Forum 388: Rugby
                                                                         (c) Forum 411: Mobile Phones and PDAs
                                         •  Variation in behaviour
                                            composition & activity
                                         •  Behaviour composition in/
                                            stability influences forum
                                            activity
Prediction analysis – preliminary results!
•  Predicting rise/fall in post submission numbers
•  Binary classification
•  Features : Community composition, roles and percentages of users
   associated with each
              Forum         P       R       F1       ROC

               246         0.799   0.769   0.780     0.800

               388         0.603   0.615   0.605     0.775

               411         0.765   0.692   0.714     0.617

                All        0.583   0.667   0.607     0.466



 •  Cross-community predictions are less reliable than individual
    community analysis due to the idiosyncratic behaviour observed in
    each individual community
Rise and fall of social networks




                                   47
Predicting engagement



•  Which posts will receive a reply?
  –  What are the most influential features here?




•  How much discussion will it generate?
  –  What are the key factors of lengthy discussions?




                                                        48
user attributes - describing the reputation of the user - and attributes of a post’s
    content - generally referred to as content features. In Table 1 we define user and

Common online communityFeatures      features
    content features and study their influence on the discussion “continuation”.
           Table 1. User and Content
                                                User Features
           In Degree:    Number of followers of U                                              #
         Out Degree:     Number of users U follows                                             #
         List Degree:    Number of lists U appears on. Lists group users by topic              #
         Post Count:     Total number of posts the user has ever posted                        #
            User Age:    Number of minutes from user join date                                 #
                                                                                         P ostCount
          Post Rate:     Posting frequency of the user                                    U serAge
                                           Content Features
         Post length: Length of the post in characters                                         #
         Complexity: Cumulative entropy of the unique words in post p λ
                                                                                    i∈[1,n] pi(log λ−log pi)
                         of total word length n and pi the frequency of each word             λ
     Uppercase count:    Number of uppercase words                                         #
         Readability:    Gunning fog index using average sentence length (ASL)             [7]
                         and the percentage of complex words (PCW).                 0.4(ASL + P CW )
         Verb Count:     Number of verbs                                                   #
         Noun Count:     Number of nouns                                                   #
     Adjective Count:    Number of adjectives                                              #
      Referral Count:    Number of @user                                                   #
     Time in the day:    Normalised time in the day measured in minutes                    #
     Informativeness:    Terminological novelty of the post wrt other posts
                         The cumulative tfIdf value of each term t in post p            t∈p   tf idf (t, p)
             Polarity:   Cumulation of polar term weights in p (using
                                                                                          P o+N e
                         Sentiwordnet3 lexicon) normalised by polar terms count           |terms|




•  How do all these features influence activity generation in an online
     4.2 Experiments
   community? are intended to test the performance of different classification mod-
     Experiments
   – els in identifying seed posts. Therefore we used four classifiers: discriminative
      Such knowledge leads to better use and management of the community                                      49

    classifiers Perceptron and SVM, the generative classifier Naive Bayes and the
Experiment for identifying Twitter seed posts


 •  Twitter data on the Haiti earthquake, and the Union
    Address


     Dataset         Users    Tweets      Seeds   Non-seeds   Replies

     Haiti           44,497   65,022      1,405    60,686      2,931

     Union Address   66,300   80,272      7,228    55,169     17,875




 •  Evaluated a binary classification task
    –  Is this post a seed post or not?


                                                                        50
first report on the results obtained from our model selection phase, before moving
   Identifying seeds with different type of
onto our results from using the best model with the top-k features.

   features
Table 3. Results from the classification of seed posts using varying feature sets and
classification models
              (a) Haiti Dataset                       (b) Union Address Dataset
                     P       R      F1     ROC                  P     R     F1    ROC
       User   Perc 0.794   0.528   0.634   0.727  User   Perc 0.658 0.697 0.677   0.673
              SVM 0.843    0.159   0.267   0.566         SVM 0.510 0.946 0.663    0.512
              NB   0.948   0.269   0.420   0.785         NB   0.844 0.086 0.157   0.707
              J48  0.906   0.679   0.776   0.822         J48  0.851 0.722 0.782   0.830
      Content Perc 0.875   0.077   0.142   0.606 Content Perc 0.467 0.698 0.560   0.457
              SVM 0.552    0.727   0.627   0.589         SVM 0.650 0.589 0.618    0.638
              NB   0.721   0.638   0.677   0.769         NB   0.762 0.212 0.332   0.649
              J48  0.685   0.705   0.695   0.711         J48  0.740 0.533 0.619   0.736
        All   Perc 0.794   0.528   0.634   0.726   All   Perc 0.630 0.762 0.690   0.672
              SVM 0.483    0.996   0.651   0.502         SVM 0.499 0.990 0.664    0.506
              NB   0.962   0.280   0.434   0.852         NB   0.874 0.212 0.341   0.737
              J48  0.824   0.775   0.798   0.836         J48  0.890 0.810 0.848   0.877


4.3     Results
Our•  findings from Table 3 demonstrate the effectiveness of using solely user
       User features are most important in Twitter
features for identifying seed posts. Infeatures gives best results Address datasets
    •  But combining user & content both the Haiti and Union
training a classification model using user features shows improved performance51
over the same models trained using content features. In the case of the Union
Impact of different features in Twitter
which we found to be 0.674 indicating a good correlation between the two lists
and• their respective ranks.the highest impact on identification of seed
      What features have
      posts?
TableRank features by information gainGain Ratio wrt Seed Post class label. The
    •  4. Features ranked by Information ratio wrt seed post class label
feature name is paired within its IG in brackets.

         Rank   Haiti                             Union Address
          1     user-list-degree (0.275)          user-list-degree (0.319)
          2     user-in-degree (0.221)            content-time-in-day (0.152)
          3     content-informativeness (0.154)   user-in-degree (0.133)
          4     user-num-posts (0.111)            user-num-posts (0.104)
          5     content-time-in-day (0.089)       user-post-rate (0.075)
          6     user-post-rate (0.075)            user-out-degree (0.056)
          7     content-polarity (0.064)          content-referral-count (0.030)
          8     user-out-degree (0.040)           user-age (0.015)
          9     content-referral-count (0.038)    content-polarity (0.015)
          10    content-length (0.020)            content-length (0.010)
          11    content-readability (0.018)       content-complexity (0.004)
          12    user-age (0.015)                  content-noun-count (0.002)
          13    content-uppercase-count (0.012)   content-readability (0.001)
          14    content-noun-count (0.010)        content-verb-count (0.001)
          15    content-adj-count (0.005)         content-adj-count (0.0)
          16    content-complexity (0.0)          content-informativeness (0.0)
          17    content-verb-count (0.0)          content-uppercase-count (0.0)
                                                                                   52
7     content-polarity (0.064)          content-referral-count (0.030)
                             8     user-out-degree (0.040)           user-age (0.015)
                             9     content-referral-count (0.038)    content-polarity (0.015)

Positive/negative impact of features
                             10
                             11
                             12
                                   content-length (0.020)
                                   content-readability (0.018)
                                   user-age (0.015)
                                                                     content-length (0.010)
                                                                     content-complexity (0.004)
                                                                     content-noun-count (0.002)
                             13    content-uppercase-count (0.012)   content-readability (0.001)
                             14    content-noun-count (0.010)        content-verb-count (0.001)
•  What is the correlation between seed posts and features?
                             15
                             16
                                   content-adj-count (0.005)
                                   content-complexity (0.0)
                                                                     content-adj-count (0.0)
                                                                     content-informativeness (0.0)
                             17    content-verb-count (0.0)          content-uppercase-count (0.0)
  Haiti
  Union Address




                  Fig. 3. Contributions of top-5 features to identifying Non-seeds (N ) and Seeds(S).
                  Upper plots are for the Haiti dataset and the lower plots are for the Union Address   53
                  dataset.
Predicting discussion activity on Twitter
•  Reply rates:
  –  Haiti 1-74 responses, Union Address 1-75 responses
•  Compare rankings
  –  Ground truth vs predicted
•  Experiments
  –  Using Haiti and Union Address datasets
  –  Evaluate predicted rank k where k={1,5,10,20,50,100)
  –  Support Vector Regression with user, content, user+content
     features

         Dataset         Training   Test size   Test Vol   Test Vol SD
                           size                  Mean
         Haiti             980        210        1.664       3.017

         Union Address    5,067      1,161       1.761       2.342       54
Predicting discussion activity on Twitter

    Haiti dataset                              Union Address dataset




           •  Content features are key for top ranks
           •  Use features more important for higher ranks


                                                                       55
Identifying seed posts in Boards.ie

•  Used the same features as before
  –  User features
     •  In-degree, out-degree, post count, user age, post rate
  –  Content features
     •  Post Length, complexity, readability, referral count, time in day,
        informativeness, polarity

•  New features designed to capture user affinity
  –  Forum Entropy
     •  Concentration of forum activity
     •  Higher entropy = large forum spread
  –  Forum Likelihood
     •  Likelihood of forum post given user history
     •  Combines post history with incoming data



                                                                             56
Experiment for identifying seed posts
•  Used all posts from Boards.ie in 2006
•  Built features using a 6-month window prior to seed post date

         Posts           Seeds    Non-Seeds   Replies     Users

         1,942,030       90,765    21,800     1,829,465   29,908




•  Evaluated a binary classification task
   –  Is this post a seed post or not?
   –  Precision, Recall, F1 and Accuracy
   –  Tested: user, content, focus features, and their combinations




                                                                      57
h the features (i.e., user                               TABLE II
om t − 188 to t − 1. In        R ESULTS FROMTHE CLASSIFICATION OF SEED POSTS USING

       Identifying seeds with different type of
he features compiled for
  outcomes and will not
                                 VARYING FEATURE SETS AND CLASSIFICATION MODELS



       features
  user may increase their
                     User SVM
                                    P
                                  0.775
                                          R
                                        0.810
                                               F
                                              0.774
                                                    ROC
                                                    0.581
                                                                                1

ich would not be a true                         Naive Bayes   0.691   0.767   0.719   0.540
ime the post was made.                          Max Ent       0.776   0.806   0.722   0.556
                                                J48           0.778   0.809   0.734   0.582
e number of posts (seeds,         Content       SVM           0.739   0.804   0.729   0.511
tained within.                                  Naive Bayes   0.730   0.794   0.740   0.616
                                                Max Ent       0.758   0.806   0.730   0.678
TING   S EED P OSTS                             J48           0.795   0.822   0.783   0.617
 ls are often hindered by         Focus         SVM           0.649   0.805   0.719   0.500
                                                Naive Bayes   0.710   0.737   0.722   0.588
We alleviate this problem                       Max Ent       0.649   0.805   0.719   0.586
  and non-seeds through a                       J48           0.649   0.805   0.719   0.500
posts have been identified     User + Content    SVM           0.790   0.808   0.727   0.509
                                                Naive Bayes   0.712   0.772   0.732   0.593
   of discussion that such                      Max Ent       0.767   0.807   0.734   0.671
ook for the best classifier                      J48           0.795   0.821   0.779   0.675
 ts and then search for the    User + Focus     SVM           0.776   0.810   0.776   0.583
                                                Naive Bayes   0.699   0.778   0.724   0.585
 guishing seed posts from                       Max Ent       0.771   0.806   0.722   0.607
atures that are associated                      J48           0.777   0.810   0.742   0.617
                              Content + Focus   SVM           0.750   0.805   0.729   0.511
                                                Naive Bayes   0.732   0.787   0.746   0.658
                                                Max Ent       0.762   0.807   0.731   0.692
                                                J48           0.798   0.823   0.787   0.662
 the previously described           All         SVM           0.791   0.808   0.727   0.510
ntaining both seeds and                         Naive Bayes   0.724   0.780   0.740   0.637
                                                Max Ent       0.768   0.808   0.733   0.688
r collection of posts we                        J48           0.798   0.824   0.792   0.692
tures listed in section III                                                                   58
Positive/negative impact of features on Boards.ie
                                                       TABLE III
                             R EDUCTION  IN F1 LEVELS AS INDIVIDUAL FEATURES ARE
                                        DROPPED FROM THE J 48 CLASSIFIER

•  What are the most
                                   Feature Dropped                      F1
   important features for          -                                  0.815
   predicting seed posts?          Post Count
                                   In-Degree
                                                                      0.815
                                                                      0.811*
                                   Out-Degree                         0.811*
                                   User Age                         0.807***
                                   Post Rate                          0.815
                                   Forum Entropy                      0.815
•  Correlations:                   Forum Likelihood                 0.798***
                                   Post Length                       0.810**
  –  Referral counts (non-seeds)   Complexity                        0.811**
  –  Forum likelihood (seeds)      Readability                      0.802***
                                   Referral Count                   0.793***
  –  Informativeness (non-seeds)   Time in Day                       0.810**
                                   Informativeness                  0.801***
  –  Readability (seeds)           Polarity                         0.808***
                                   Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 .
  –  User age (non-seeds)


                          hyperlinks (e.g., ads and spams). This contrasts with work in
                          Twitter which found that tweets containing many links were
                                                                                             59
Predicting Discussion Activity in Boards.ie

•  What impact do features have on discussion length?
  –  Assessed Linear Regression model with focus and content
     features

  –  Forum Likelihood (pos)
  –  Content Length (+/neutral)
  –  Complexity (pos)
  –  Readability (+/neutral)
  –  Referral Count (neg)
  –  Time in Day (+/neutral)
  –  Informativeness (-/neutral)
  –  Polarity (neg)




                                                               60
Stay tuned
•  More communities
  –  SAP, IBM, StackOverflow, Reddit
  –  Compare impact of features on their dynamics
•  Better behaviour analysis
  –  Less features, more forums/communities, more graphs!
  –  Healthy? posts, reciprocation, discussions, sentiment mixture
•  Churn analysis
  –  Correlation of features/behaviour to ‘bounce rate’ (WebSci11 best paper)
•  Intervention!
  –  Opportunities and mechanisms to influence behaviour




                                                                                61
Upcoming events

             Social Object Networks
              IEEE Social Computing, 2011
                October 9-10, Boston, USA

  http://ir.ii.uam.es/socialobjects2011/
                                       !
                Deadline: August 5, 2011



  Intelligent Web Services Meet Social Computing
             AAAI Spring Symposium 2012,
             March 26-28, Stanford, California

    http://vitvar.com/events/aaai-ss12
                Deadline: Octover 7, 2011

                                                   62
Acknowledgement
    My social semantics team                       Live Social Semantics team




  Sofia Angeletou                                Ciro Cattuto     Wouter van Den Broeck
                        Matthew Rowe
 Research Associate                               ISI, Turin            ISI, Turin
                      Research Associate




                                               Alain Barrat           Martin Szomszor
                                            CPT Marseille & ISI    CeRC, City University, UK




                                             Gianluca Correndo, Uni Southampton
                                                  Ivan Cantador, UAM, Madrid
                                                          STI International
                                           ESWC09/10 & HT09 chairs and organisers
                                                        All LSS participants


                                                                                               63
Harith Alani's presentation at SSSW 2011

More Related Content

What's hot

Dagstuhl FOAF history talk
Dagstuhl FOAF history talkDagstuhl FOAF history talk
Dagstuhl FOAF history talkDan Brickley
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 
How To Make Friends And Inference People
How To Make Friends And Inference PeopleHow To Make Friends And Inference People
How To Make Friends And Inference PeopleDan Brickley
 
Social Machines Oxford Hendler
Social Machines Oxford HendlerSocial Machines Oxford Hendler
Social Machines Oxford HendlerJames Hendler
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in PracticePeter Mika
 
Linked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech EastLinked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech EastJames Hendler
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)James Hendler
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0John Breslin
 
Web science AI and IA
Web science AI and IAWeb science AI and IA
Web science AI and IAFabien Gandon
 
Semantic Web: an Introduction
Semantic Web: an IntroductionSemantic Web: an Introduction
Semantic Web: an IntroductionLuigi De Russis
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update James Hendler
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsPeter Mika
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008Blogtalk 2008
 
The Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient webThe Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient webFabien Gandon
 

What's hot (20)

Dagstuhl FOAF history talk
Dagstuhl FOAF history talkDagstuhl FOAF history talk
Dagstuhl FOAF history talk
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
 
How To Make Friends And Inference People
How To Make Friends And Inference PeopleHow To Make Friends And Inference People
How To Make Friends And Inference People
 
Social Machines Oxford Hendler
Social Machines Oxford HendlerSocial Machines Oxford Hendler
Social Machines Oxford Hendler
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Linked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech EastLinked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech East
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
When?
When?When?
When?
 
Kohacon2016
Kohacon2016Kohacon2016
Kohacon2016
 
Web science AI and IA
Web science AI and IAWeb science AI and IA
Web science AI and IA
 
Semantic Web: an Introduction
Semantic Web: an IntroductionSemantic Web: an Introduction
Semantic Web: an Introduction
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008
 
The Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient webThe Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient web
 
MyLifeBits van Microsoft
MyLifeBits van MicrosoftMyLifeBits van Microsoft
MyLifeBits van Microsoft
 

Similar to Harith Alani's presentation at SSSW 2011

Monitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesMonitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesThe Open University
 
Semantics, Sensors, and the Social Web
Semantics, Sensors, and the Social WebSemantics, Sensors, and the Social Web
Semantics, Sensors, and the Social WebThe Open University
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeDan Brickley
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards GapDan Brickley
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social SemanticsMatthew Rowe
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 
Breaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsBreaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsJohn Breslin
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ NettabDuncan Hull
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Live Social Semantics @ ESWC2010
Live Social Semantics @ ESWC2010Live Social Semantics @ ESWC2010
Live Social Semantics @ ESWC2010Martin Szomszor
 
2 Conferences in 1 hour
2 Conferences in 1 hour2 Conferences in 1 hour
2 Conferences in 1 hourIan Forrester
 
Online Presence Ontology
Online Presence OntologyOnline Presence Ontology
Online Presence Ontologypabrunet
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
 
Data Portability with SIOC and FOAF
Data Portability with SIOC and FOAFData Portability with SIOC and FOAF
Data Portability with SIOC and FOAFUldis Bojars
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?milesw
 
Caliber 2009 Tutorial Mgsree
Caliber 2009 Tutorial MgsreeCaliber 2009 Tutorial Mgsree
Caliber 2009 Tutorial Mgsreemgsree
 
ESWC SS 2012 - Wednesday Tutorial Matthew Rowe: Social Semantics
ESWC SS 2012 - Wednesday Tutorial Matthew Rowe: Social SemanticsESWC SS 2012 - Wednesday Tutorial Matthew Rowe: Social Semantics
ESWC SS 2012 - Wednesday Tutorial Matthew Rowe: Social Semanticseswcsummerschool
 

Similar to Harith Alani's presentation at SSSW 2011 (20)

Monitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesMonitoring and Analysis of Online Communities
Monitoring and Analysis of Online Communities
 
Semantics, Sensors, and the Social Web
Semantics, Sensors, and the Social WebSemantics, Sensors, and the Social Web
Semantics, Sensors, and the Social Web
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in Practice
 
Understanding the Standards Gap
Understanding the Standards GapUnderstanding the Standards Gap
Understanding the Standards Gap
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Shaping our futures: the Social Semantic Web
Shaping our futures: the Social Semantic WebShaping our futures: the Social Semantic Web
Shaping our futures: the Social Semantic Web
 
Breaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsBreaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social Semantics
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
Social web Ontologies
Social web OntologiesSocial web Ontologies
Social web Ontologies
 
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti... NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Live Social Semantics @ ESWC2010
Live Social Semantics @ ESWC2010Live Social Semantics @ ESWC2010
Live Social Semantics @ ESWC2010
 
2 Conferences in 1 hour
2 Conferences in 1 hour2 Conferences in 1 hour
2 Conferences in 1 hour
 
Online Presence Ontology
Online Presence OntologyOnline Presence Ontology
Online Presence Ontology
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
 
Data Portability with SIOC and FOAF
Data Portability with SIOC and FOAFData Portability with SIOC and FOAF
Data Portability with SIOC and FOAF
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
 
Caliber 2009 Tutorial Mgsree
Caliber 2009 Tutorial MgsreeCaliber 2009 Tutorial Mgsree
Caliber 2009 Tutorial Mgsree
 
ESWC SS 2012 - Wednesday Tutorial Matthew Rowe: Social Semantics
ESWC SS 2012 - Wednesday Tutorial Matthew Rowe: Social SemanticsESWC SS 2012 - Wednesday Tutorial Matthew Rowe: Social Semantics
ESWC SS 2012 - Wednesday Tutorial Matthew Rowe: Social Semantics
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Harith Alani's presentation at SSSW 2011

  • 1. Live Social Semantics & online community monitoring Harith Alani Knowledge Media institute, The Open University, UK http://twitter.com/halani http://delicious.com/halani http://www.linkedin.com/pub/harith-alani/9/739/534 Semantic Web Summer School Cercedilla, Spain,, 2011 1
  • 2. Market value of Web Analytics 2
  • 3. Location, Sensors, & Social Networking Tag-Along Marketing The New York Times, November 6, 2010 “Everything is in place for location-based social networking to be the next big thing. Tech companies are building the platforms, venture capitalists are providing the cash and marketers are eager to develop advertising. “ 3
  • 4. Location, Sensors, & Social Networking The Canine Twitterer “Having my daily workout. Already did 15 leg lifts!” 4
  • 5. Monitoring online/offline social activity Where  is  everybody?   5
  • 6. Monitoring online/offline social activity •  Generating opportunities for F2F networking 6
  • 7. Tracking of F2F contact networks Sociometer, MIT, 2002 -  F2F and productivity -  F2F dynamics -  Who are key players? -  F2F and office distance TraceEncounters - 2004 7
  • 8. SocioPatterns platform http://www.sociopatterns.org/! 8
  • 9. Convergence with online social networks 9
  • 10. Online vs. offline social networking •  Digital social networking •  Digital networking increase increases physical social social interaction isolation –  Create more opportunities to network •  Causes –  Supports and increases F2F –  Genetic alterations contact! –  Weakened immune system –  Stronger offline social tiesà –  Less resistant to cancer more online communication –  Higher risk of heart disease –  Stronger offline social ties à –  Higher blood pressure more diverse online –  Faster dementia communications –  Narrower arteries –  F2F is medium of choice in weaker social ties Barry Wellman, The Glocal Village: Internet and Aric Sigman, “Well Connected? The Biological Community, Idea’s - The Arts & Science Review, Implications of 'Social Networking’”, Biologist, 56 University of Toronto, 1(1),2004 (1), 2009 10
  • 11. Offline + online social networking Who should Anyone I I talk to? Where have I know here? met this guy? Where should I go? ESWC2010 11
  • 12. Live Social Semantics (LSS): RFIDs + Social Web + Semantic Web <?xml version="1.0"?>! <rdf:RDF! xmlns="http:// tagora.ecs.soton.ac.uk/schemas/ tagging#"! xmlns:rdf="http://www.w3.org/ 1999/02/22-rdf-syntax-ns#"! xmlns:xsd="http://www.w3.org/2001/ XMLSchema#"! xmlns:rdfs="http://www.w3.org/ 2000/01/rdf-schema#"! xmlns:owl="http://www.w3.org/ 2002/07/owl#"! xml:base="http:// tagora.ecs.soton.ac.uk/schemas/ tagging">! <owl:Ontology rdf:about=""/>! <owl:Class rdf:ID="Post"/>! <owl:Class rdf:ID="TagInfo"/>! <owl:Class rdf:ID="GlobalCooccurrenceInfo"/>! <owl:Class rdf:ID="DomainCooccurrenceInfo"/>! <owl:Class rdf:ID="UserTag"/>! <owl:Class rdf:ID="UserCooccurrenceInfo"/>! <owl:Class rdf:ID="Resource"/>! <owl:Class rdf:ID="GlobalTag"/>! <owl:Class rdf:ID="Tagger"/>! <owl:Class rdf:ID="DomainTag"/>! <owl:ObjectProperty rdf:ID="hasPostTag">! <rdfs:domain rdf:resource="#TagInfo"/>! </owl:ObjectProperty>! <owl:ObjectProperty rdf:ID="hasDomainTag">! <rdfs:domain rdf:resource="#UserTag"/>! </owl:ObjectProperty>! <owl:ObjectProperty rdf:ID="isFilteredTo">! •  Integration of physical presence and online information <rdfs:range rdf:resource="#GlobalTag"/>! <rdfs:domain •  Semantic user profile generation rdf:resource="#GlobalTag"/>! </owl:ObjectProperty>! <owl:ObjectProperty •  Logging of face-to-face contactrdf:ID="hasResource">! <rdfs:domain rdf:resource="#Post"/>! <rdfs:range =…! •  Social network browsing •  Analysis of online vs offline social networks
  • 13. m Live Social Semantics: architecture Communities of Practice Communities of Practice dbtune.org rkbexplorer.com Publications Profile Builder dbpedia.org Publications Profile org semanticweb.org ontology Web-based Systems Profile interests data.semanticweb.org TAGora Sense builderDelicious rkbexplorer.com Repository Extractor Extractor publications, co-authorship networks Flickr Daemon Social Tagging mbid -> dbpedia uri Daemon Social Tagging Social Networks tag -> dbpedia uri Social Networks LastFM Connect API JXT Triple Store Facebook Connect API JXT Trip Contacts social semantics Contacts URIs Tag disambiguation Social triple store Social service Semantics RDF cache Aggregator Semantics RDF cache contacts data RFID Local Local Readers Real World Server Server Real-World Tag to URI Real-World networks service Contact Data Contact Data tags RFID Badges Visualization Web Interface Linked Data Visualization Web In Linked data Web interface Visualization 13
  • 14. http://data.semanticweb.org/! SW resources www.rkbexplorer.com/! conference chair proceedings chair author CoP 14
  • 15. Social and information networks 15
  • 17. Distinct, Separated Identity Management Harith     http://tagora.ecs.soton.ac.uk/ Alani   LiveSocialSemantics/eswc2009/foaf/2   Delicious  Tagging  and  Network   RFID  Contact  Data   http://tagora.ecs.soton.ac.uk/ http://tagora.ecs.soton.ac.uk/ delicious/halani LiveSocialSemantics/eswc2009/1139 Flickr  Tagging  and  Contacts   Conference  Publica>on  Data   http://tagora.ecs.soton.ac.uk/flickr/ http://data.semanticweb.org/person/ 69749885@N00 harith-alani/ Las:m  favourite  ar>sts  and  friends   Past  Publica>ons,  Projects,  Communi>es  of  Prac>ce   http://tagora.ecs.soton.ac.uk/ http://southampton.rkbexplorer.com/id/ lastfm/halani person-05877 Facebook  contacts   http://tagora.ecs.soton.ac.uk/ facebook/568493878
  • 18. Tag Filtering Service Semantic modeling Semantic analysis Collective intelligence Statistical analysis Syntactical analysis 18
  • 20. From Tags to Semantics 20
  • 21. Tags to User Interests 21
  • 22. From raw tags and social relations to Structured Data Collective intelligence User raw Semantic data data Structured data ontologies 22
  • 23. RFIDs for tracking social contact 23
  • 24. People contact à RFID à RDF Triples foaf#Person1 contactWith   Place hasContact   foaf#Person2 contactPlace   F2FContact contactDate   contactDura>on   XMLSchema#date   XMLSchema#>me   24
  • 25. 25
  • 26. Real-time F2F networks with SNS links 26 http://www.vimeo.com/6590604
  • 27. Live Social Semantics Deployed at: Data analysis •  Face-to-face interactions across scientific conferences •  Networking behaviour of frequent users •  Correlations between scientific seniority and social networking •  Comparison of F2F contact network with Twitter and Facebook •  Social networking with online and offline friends 27
  • 28. Characteristics of F2F contact network Network ESWC 2009 HT 2009 ESWC 2010 characteristics Number of users 175 113 158 Average degree 54 39 55 Avg. strength (mn) 143 123 130 Avg. weight (mn) 2.65 3.15 2.35 Weights ≤ 1 mn 70% 67% 74% Weights ≤ 5 mn 90% 89% 93% Weights ≤ 10 mn 95% 94% 96% •  Degree is number of people with whom the person had at least one F2F contact •  Strength is the time spent in a F2F contact •  Edge weight is total time spent by a pair of users in F2F contact 28
  • 29. Characteristics of F2F contact events Contact ESWC 2009 HT 2009 ESWC 2010 characteristics Number of 16258 9875 14671 contact events Average contact 46 42 42 length (s) Contacts ≤ 1mn 87% 89% 88% Contacts ≤ 2mn 94% 96% 95% Contacts ≤ 5mn 99% 99% 99% Contacts ≤ 10mn 99.8% 99.8% 99.8% F2F contact pattern is very similar for all three conferences
  • 30. F2F contacts of returning users Degree •  Degree: number of other 10 2 participants with whom an attendee has interacted 1 10 1 2 10 10 •  Total time: total time spent in ESWC2010 Total interaction time interaction by an attendee 4 10 3 10 3 4 5 10 10 10 •  Link weight: total time spent in F2F 4 Links’ weights 10 interaction by a pair of returning 3 10 attendees in 2010, versus the same 2 10 quantity measured in 2009 1 10 1 2 3 4 5 10 10 10 10 10 ESWC 2009 & Pearson Correlation ESWC2009 ESWC 2010 Degree 0.37 Time spent on F2F networking by frequent users is stable, even when the list of Total F2F 0.76 interaction time people they networked with changed Link weight 0.75 30
  • 31. Average seniority of neighbours in F2F networks •  No clear pattern is observed 5 if the unweighted average senn Avg seniority of the neighbours over all neighbours in the Average seniority of neighbors senn,w with weighted averages aggregated network is 4 considered senn,max Seniority of user with strongest link •  A correlation is observed 3 when each neighbour is weighted by the time spent with the main person 2 •  The correlation becomes much stronger when 1 considering for each individual only the neighbour with whom the most time was spent 0 0 5 10 seniority (number of papers) Conference attendees tend to networks with others of similar levels of scientific seniority 31
  • 33. Offline networking vs online networking Twitterers Spearman Correlation (ρ) Tweets – F2F Degree - 0.15 Tweets – F2F Strength - 0.15 Twitter Following – F2F - 0.21 Degree users Users with Facebook and Twitter accounts in ESWC 2010 •  people who have a large number of friends on Twitter and/or Facebook don’t seem to be the most socially active in the offline world in comparison to other SNS users No strong correlation between amount of F2F contact activity and size of online social networks 33
  • 34. Scientific seniority vs Twitter followers Twitter users Correlation H-index – Twitter Followers 0.32 (#$" H-index – Tweets - 0.13 (" !#'" *+,-./"01221+./3" !#&" 45678.9" *+..:3" !#%" !#$" !" (" &" ((" (&" $(" $&" )(" )&" %(" users •  Comparison between people’s scientific seniority and the number of people following them on Twitter People who have the highest number of Twitter followers are not necessarily the most scientifically senior, although they do have high visibility and experience 34
  • 35. Conference Chairs all chairs all chairs participants 2009 participants 2010 2009 2010 average degree 55 77.7 54 77.6 average strength 8590 19590 7807 22520 average weight 159 500 141 674 average number of 3.44 8 3.37 12 events per edge •  Conf chairs interact with more distinct people (larger average degree) •  Conf chairs spend more time in F2F interaction (almost three times as much as a random participant)
  • 36. Networking with online and offline ‘friends’ Characteristics all users coauthors Facebook Twitter friends followers average contact 42 75 63 72 duration (s) average edge weight 141 4470 830 1010 (s) average number of 3.37 60 13 14 events per edge •  Individuals sharing an online or professional social link meet much more often than other individuals •  Average number of encounters, and total time spent in interaction, is highest for co-authors F2F contacts with Facebook & Twitter friends were respectively %50 and %71 longer, and %286 and %315 more frequent than with others They spent %79 more time in F2F contacts with their co-authors, and they met them %1680 more times than they met non co-authors
  • 37. Twitterers vs Non-Twitterers •  Time spent in conference rooms –  Twitter users spent on average 11.4% more time in the conf rooms than non-twitter users (mean is 26% higher) •  Number of people met F2F during the conference –  Twitter users met on average 9% more people F2F (mean 8% higher) •  Duration of F2F contacts –  Twitter users spent on average 63% more time in F2F contact than non twitter users (mean is 20% higher) 37
  • 38. Behaviour of individuals – micro level analysis (#$" 6DD1">?@20AB?M" 89O1209>M"PQM"12R2<DE27>#" ;01">D?@;<">@60;<>"" @0"K88"92;L" S:DT>"9:2"0239">9;7"72>2;7?:27N" (" !#'" !#&" :2;<9:=">?@20AB?"C" >D?@;<"E7DB<2>#"F72G" ?:;@7>HIJ>" !#%" !#$" DO9>@127M" :@6:" >:=" E7DB<2" >?@20A>9N" !" (" )" *" (+" (," $(" $)" $*" ++" +," %(" %)" -./0123" 4$4"526722" 4$4"8972069:" 38
  • 39. Behaviour analysis Jeffrey Chan, Conor Hayes, and Elizabeth Daly. Decomposing discussion forums using common user roles. In Proc. Web Science Conf. (WebSci10), Raleigh, NC: US, 2010
  • 41. Encoding Rules in Ontologies with SPIN
  • 42. Approach for inferring User Roles Structural, social network, Feature levels change with the reciprocity, persistence, participation dynamics of the community Run our rules over each user’s features Associate Roles with a collection of and derive the role composition feature-to-level Mappings e.g. in-degree -> high, out-degree -> high 42
  • 43. Data from Boards.ie •  Forum 246 (Commuting and Transport): Demonstrates a clear increase in activity over time. •  Forum 388 (Rugby): Exhibits periodic increase and decrease in activity and hence it provides good examples of healthy/unhealthy evolutions. •  Forum 411 (Mobile Phones and PDAs): Increase in activity over time with some fluctuation - i.e. reduction and increase over various time windows. •  For the time in 2004-01 to 2006-12
  • 44. Results Commuting and Transport Rugby Mobile Phones and PDAs •  Correlation of individual features in each of the three forums
  • 45. (a) Forum 246: Commuting and Transport Results (b) Forum 388: Rugby (c) Forum 411: Mobile Phones and PDAs •  Variation in behaviour composition & activity •  Behaviour composition in/ stability influences forum activity
  • 46. Prediction analysis – preliminary results! •  Predicting rise/fall in post submission numbers •  Binary classification •  Features : Community composition, roles and percentages of users associated with each Forum P R F1 ROC 246 0.799 0.769 0.780 0.800 388 0.603 0.615 0.605 0.775 411 0.765 0.692 0.714 0.617 All 0.583 0.667 0.607 0.466 •  Cross-community predictions are less reliable than individual community analysis due to the idiosyncratic behaviour observed in each individual community
  • 47. Rise and fall of social networks 47
  • 48. Predicting engagement •  Which posts will receive a reply? –  What are the most influential features here? •  How much discussion will it generate? –  What are the key factors of lengthy discussions? 48
  • 49. user attributes - describing the reputation of the user - and attributes of a post’s content - generally referred to as content features. In Table 1 we define user and Common online communityFeatures features content features and study their influence on the discussion “continuation”. Table 1. User and Content User Features In Degree: Number of followers of U # Out Degree: Number of users U follows # List Degree: Number of lists U appears on. Lists group users by topic # Post Count: Total number of posts the user has ever posted # User Age: Number of minutes from user join date # P ostCount Post Rate: Posting frequency of the user U serAge Content Features Post length: Length of the post in characters # Complexity: Cumulative entropy of the unique words in post p λ i∈[1,n] pi(log λ−log pi) of total word length n and pi the frequency of each word λ Uppercase count: Number of uppercase words # Readability: Gunning fog index using average sentence length (ASL) [7] and the percentage of complex words (PCW). 0.4(ASL + P CW ) Verb Count: Number of verbs # Noun Count: Number of nouns # Adjective Count: Number of adjectives # Referral Count: Number of @user # Time in the day: Normalised time in the day measured in minutes # Informativeness: Terminological novelty of the post wrt other posts The cumulative tfIdf value of each term t in post p t∈p tf idf (t, p) Polarity: Cumulation of polar term weights in p (using P o+N e Sentiwordnet3 lexicon) normalised by polar terms count |terms| •  How do all these features influence activity generation in an online 4.2 Experiments community? are intended to test the performance of different classification mod- Experiments – els in identifying seed posts. Therefore we used four classifiers: discriminative Such knowledge leads to better use and management of the community 49 classifiers Perceptron and SVM, the generative classifier Naive Bayes and the
  • 50. Experiment for identifying Twitter seed posts •  Twitter data on the Haiti earthquake, and the Union Address Dataset Users Tweets Seeds Non-seeds Replies Haiti 44,497 65,022 1,405 60,686 2,931 Union Address 66,300 80,272 7,228 55,169 17,875 •  Evaluated a binary classification task –  Is this post a seed post or not? 50
  • 51. first report on the results obtained from our model selection phase, before moving Identifying seeds with different type of onto our results from using the best model with the top-k features. features Table 3. Results from the classification of seed posts using varying feature sets and classification models (a) Haiti Dataset (b) Union Address Dataset P R F1 ROC P R F1 ROC User Perc 0.794 0.528 0.634 0.727 User Perc 0.658 0.697 0.677 0.673 SVM 0.843 0.159 0.267 0.566 SVM 0.510 0.946 0.663 0.512 NB 0.948 0.269 0.420 0.785 NB 0.844 0.086 0.157 0.707 J48 0.906 0.679 0.776 0.822 J48 0.851 0.722 0.782 0.830 Content Perc 0.875 0.077 0.142 0.606 Content Perc 0.467 0.698 0.560 0.457 SVM 0.552 0.727 0.627 0.589 SVM 0.650 0.589 0.618 0.638 NB 0.721 0.638 0.677 0.769 NB 0.762 0.212 0.332 0.649 J48 0.685 0.705 0.695 0.711 J48 0.740 0.533 0.619 0.736 All Perc 0.794 0.528 0.634 0.726 All Perc 0.630 0.762 0.690 0.672 SVM 0.483 0.996 0.651 0.502 SVM 0.499 0.990 0.664 0.506 NB 0.962 0.280 0.434 0.852 NB 0.874 0.212 0.341 0.737 J48 0.824 0.775 0.798 0.836 J48 0.890 0.810 0.848 0.877 4.3 Results Our•  findings from Table 3 demonstrate the effectiveness of using solely user User features are most important in Twitter features for identifying seed posts. Infeatures gives best results Address datasets •  But combining user & content both the Haiti and Union training a classification model using user features shows improved performance51 over the same models trained using content features. In the case of the Union
  • 52. Impact of different features in Twitter which we found to be 0.674 indicating a good correlation between the two lists and• their respective ranks.the highest impact on identification of seed What features have posts? TableRank features by information gainGain Ratio wrt Seed Post class label. The •  4. Features ranked by Information ratio wrt seed post class label feature name is paired within its IG in brackets. Rank Haiti Union Address 1 user-list-degree (0.275) user-list-degree (0.319) 2 user-in-degree (0.221) content-time-in-day (0.152) 3 content-informativeness (0.154) user-in-degree (0.133) 4 user-num-posts (0.111) user-num-posts (0.104) 5 content-time-in-day (0.089) user-post-rate (0.075) 6 user-post-rate (0.075) user-out-degree (0.056) 7 content-polarity (0.064) content-referral-count (0.030) 8 user-out-degree (0.040) user-age (0.015) 9 content-referral-count (0.038) content-polarity (0.015) 10 content-length (0.020) content-length (0.010) 11 content-readability (0.018) content-complexity (0.004) 12 user-age (0.015) content-noun-count (0.002) 13 content-uppercase-count (0.012) content-readability (0.001) 14 content-noun-count (0.010) content-verb-count (0.001) 15 content-adj-count (0.005) content-adj-count (0.0) 16 content-complexity (0.0) content-informativeness (0.0) 17 content-verb-count (0.0) content-uppercase-count (0.0) 52
  • 53. 7 content-polarity (0.064) content-referral-count (0.030) 8 user-out-degree (0.040) user-age (0.015) 9 content-referral-count (0.038) content-polarity (0.015) Positive/negative impact of features 10 11 12 content-length (0.020) content-readability (0.018) user-age (0.015) content-length (0.010) content-complexity (0.004) content-noun-count (0.002) 13 content-uppercase-count (0.012) content-readability (0.001) 14 content-noun-count (0.010) content-verb-count (0.001) •  What is the correlation between seed posts and features? 15 16 content-adj-count (0.005) content-complexity (0.0) content-adj-count (0.0) content-informativeness (0.0) 17 content-verb-count (0.0) content-uppercase-count (0.0) Haiti Union Address Fig. 3. Contributions of top-5 features to identifying Non-seeds (N ) and Seeds(S). Upper plots are for the Haiti dataset and the lower plots are for the Union Address 53 dataset.
  • 54. Predicting discussion activity on Twitter •  Reply rates: –  Haiti 1-74 responses, Union Address 1-75 responses •  Compare rankings –  Ground truth vs predicted •  Experiments –  Using Haiti and Union Address datasets –  Evaluate predicted rank k where k={1,5,10,20,50,100) –  Support Vector Regression with user, content, user+content features Dataset Training Test size Test Vol Test Vol SD size Mean Haiti 980 210 1.664 3.017 Union Address 5,067 1,161 1.761 2.342 54
  • 55. Predicting discussion activity on Twitter Haiti dataset Union Address dataset •  Content features are key for top ranks •  Use features more important for higher ranks 55
  • 56. Identifying seed posts in Boards.ie •  Used the same features as before –  User features •  In-degree, out-degree, post count, user age, post rate –  Content features •  Post Length, complexity, readability, referral count, time in day, informativeness, polarity •  New features designed to capture user affinity –  Forum Entropy •  Concentration of forum activity •  Higher entropy = large forum spread –  Forum Likelihood •  Likelihood of forum post given user history •  Combines post history with incoming data 56
  • 57. Experiment for identifying seed posts •  Used all posts from Boards.ie in 2006 •  Built features using a 6-month window prior to seed post date Posts Seeds Non-Seeds Replies Users 1,942,030 90,765 21,800 1,829,465 29,908 •  Evaluated a binary classification task –  Is this post a seed post or not? –  Precision, Recall, F1 and Accuracy –  Tested: user, content, focus features, and their combinations 57
  • 58. h the features (i.e., user TABLE II om t − 188 to t − 1. In R ESULTS FROMTHE CLASSIFICATION OF SEED POSTS USING Identifying seeds with different type of he features compiled for outcomes and will not VARYING FEATURE SETS AND CLASSIFICATION MODELS features user may increase their User SVM P 0.775 R 0.810 F 0.774 ROC 0.581 1 ich would not be a true Naive Bayes 0.691 0.767 0.719 0.540 ime the post was made. Max Ent 0.776 0.806 0.722 0.556 J48 0.778 0.809 0.734 0.582 e number of posts (seeds, Content SVM 0.739 0.804 0.729 0.511 tained within. Naive Bayes 0.730 0.794 0.740 0.616 Max Ent 0.758 0.806 0.730 0.678 TING S EED P OSTS J48 0.795 0.822 0.783 0.617 ls are often hindered by Focus SVM 0.649 0.805 0.719 0.500 Naive Bayes 0.710 0.737 0.722 0.588 We alleviate this problem Max Ent 0.649 0.805 0.719 0.586 and non-seeds through a J48 0.649 0.805 0.719 0.500 posts have been identified User + Content SVM 0.790 0.808 0.727 0.509 Naive Bayes 0.712 0.772 0.732 0.593 of discussion that such Max Ent 0.767 0.807 0.734 0.671 ook for the best classifier J48 0.795 0.821 0.779 0.675 ts and then search for the User + Focus SVM 0.776 0.810 0.776 0.583 Naive Bayes 0.699 0.778 0.724 0.585 guishing seed posts from Max Ent 0.771 0.806 0.722 0.607 atures that are associated J48 0.777 0.810 0.742 0.617 Content + Focus SVM 0.750 0.805 0.729 0.511 Naive Bayes 0.732 0.787 0.746 0.658 Max Ent 0.762 0.807 0.731 0.692 J48 0.798 0.823 0.787 0.662 the previously described All SVM 0.791 0.808 0.727 0.510 ntaining both seeds and Naive Bayes 0.724 0.780 0.740 0.637 Max Ent 0.768 0.808 0.733 0.688 r collection of posts we J48 0.798 0.824 0.792 0.692 tures listed in section III 58
  • 59. Positive/negative impact of features on Boards.ie TABLE III R EDUCTION IN F1 LEVELS AS INDIVIDUAL FEATURES ARE DROPPED FROM THE J 48 CLASSIFIER •  What are the most Feature Dropped F1 important features for - 0.815 predicting seed posts? Post Count In-Degree 0.815 0.811* Out-Degree 0.811* User Age 0.807*** Post Rate 0.815 Forum Entropy 0.815 •  Correlations: Forum Likelihood 0.798*** Post Length 0.810** –  Referral counts (non-seeds) Complexity 0.811** –  Forum likelihood (seeds) Readability 0.802*** Referral Count 0.793*** –  Informativeness (non-seeds) Time in Day 0.810** Informativeness 0.801*** –  Readability (seeds) Polarity 0.808*** Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 . –  User age (non-seeds) hyperlinks (e.g., ads and spams). This contrasts with work in Twitter which found that tweets containing many links were 59
  • 60. Predicting Discussion Activity in Boards.ie •  What impact do features have on discussion length? –  Assessed Linear Regression model with focus and content features –  Forum Likelihood (pos) –  Content Length (+/neutral) –  Complexity (pos) –  Readability (+/neutral) –  Referral Count (neg) –  Time in Day (+/neutral) –  Informativeness (-/neutral) –  Polarity (neg) 60
  • 61. Stay tuned •  More communities –  SAP, IBM, StackOverflow, Reddit –  Compare impact of features on their dynamics •  Better behaviour analysis –  Less features, more forums/communities, more graphs! –  Healthy? posts, reciprocation, discussions, sentiment mixture •  Churn analysis –  Correlation of features/behaviour to ‘bounce rate’ (WebSci11 best paper) •  Intervention! –  Opportunities and mechanisms to influence behaviour 61
  • 62. Upcoming events Social Object Networks IEEE Social Computing, 2011 October 9-10, Boston, USA http://ir.ii.uam.es/socialobjects2011/ ! Deadline: August 5, 2011 Intelligent Web Services Meet Social Computing AAAI Spring Symposium 2012, March 26-28, Stanford, California http://vitvar.com/events/aaai-ss12 Deadline: Octover 7, 2011 62
  • 63. Acknowledgement My social semantics team Live Social Semantics team Sofia Angeletou Ciro Cattuto Wouter van Den Broeck Matthew Rowe Research Associate ISI, Turin ISI, Turin Research Associate Alain Barrat Martin Szomszor CPT Marseille & ISI CeRC, City University, UK Gianluca Correndo, Uni Southampton Ivan Cantador, UAM, Madrid STI International ESWC09/10 & HT09 chairs and organisers All LSS participants 63