SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Predicting Discussions on
the Social Semantic Web
   Matthew Rowe, Sofia Angeletou and
             Harith Alani

     Knowledge Media Institute, The Open
   University, Milton Keynes, United Kingdom
Mass of Social Data
Social content is now published at a staggering
rate….




Predicting Discussions on the Social Semantic Web    1
Social Data
                                        Publication Rates
• ~600 Tweets per second [1]
• ~700 Facebook status updates per second [1]
• Spinn3r dataset collected from Jan – Feb 2011
  [2]
     – 133 million blog posts
     – 5.7 million forum posts
     – 231 million social media posts



[1] http://searchengineland.com/by-the-numbers-twitter-vs-facebook-vs-google-buzz-36709
[2] http://icwsm.org/data/index.php




Predicting Discussions on the Social Semantic Web                                         2
The New
                                    Information Era




Predicting Discussions on the Social Semantic Web     3
But…. Analysis is
                                    Limited
• Market Analysts
    – What are people saying about my products?
• Opinion Mining
    – How are people perceiving a given subject or
      topic?
• eGovernment Policy Makers
    – How is a policy or law received by the public?
    – How can I maximise feedback to my content?




Predicting Discussions on the Social Semantic Web       4
Attention
                                    Economics
• Given all this data…
     How do we decide on what information
                   to focus on?

              How do we know what posts will
                 evolve into discussions?

• Attention Economics (Goldhaber, 1997)
• Need to understand key indicators of high-
  attention discussions

Predicting Discussions on the Social Semantic Web   5
Discussions on
                                    Twitter
• Twitter is used as medium to:
    – Share opinions and ideas
    – Engage in discussions
         • Discussing events
         • Debating topics
• Identifying online discussions enables:
    –   Up-to-date public opinion
    –   Observation of topics of interest
    –   Gauging the popularity of government policies
    –   Fine-grained customer support


Predicting Discussions on the Social Semantic Web       6
Predicting
                                    Discussions
• Pre-empt discussions on the Social Web:
    1. Identifying seed posts
         •    i.e. posts that start a discussion
         •    Will a given post start a discussion?
         •    What are the key features of seed posts?
    2. Predicting discussion activity levels
         •    What is the level of discussion that a seed post
              will generate?
         •    What are the key factors of lengthy discussions?




Predicting Discussions on the Social Semantic Web            7
The Need for
                                    Semantics
• For predictions we require statistical features
    – User features
    – Content features


• Features provided using differing schemas by
  different platforms
    – How to overcome heterogeneity?
• Currently, no ontologies capture such features




Predicting Discussions on the Social Semantic Web   8
Behaviour Ontology




                                                    www.purl.org/NET/oubo/0.23/

Predicting Discussions on the Social Semantic Web                        9
T he likelihood of post s elicit ing replies depends upon popularity, a highly subjec-
 t ive t erm influenced by ext ernal fact ors. Propert ies influencing popularity include

                                                              Features
 user at t ribut es - describing t he reput at ion of t he user - and at t ribut es of a post ’s
 cont ent - generally referred t o as cont ent feat ures. In Table 1 we define user and
 cont ent feat ures and st udy t heir influence on t he discussion “ cont inuat ion” .
                                          T abl e 1. User and Cont ent Feat ures
                                                              U ser Feat ur es
            I n D eg r ee:        N umb er of fol l ower s of U                                                             #
         O u t D eg r ee:         N umb er of user s U fol l ows                                                            #
         L i st D eg r ee:        N umb er of l i st s U app ear s on. L i st s gr ou p user s by t op i c                  #
         P o st C o u n t :       T ot al numb er of p ost s t he user has ever p ost ed                                    #
             U ser A g e:         N umb er of m i nut es fr om user j oi n dat e                                            #
                                                                                                                    P ost C ou n t
            P o st R at e:        Post i ng fr equency of t he user                                                  U s er A g e
                                                       Cont ent Feat ur es
         P o st l en g t h : L engt h of t he p ost i n char act er s                                                       #
         C o m p l ex i t y : Cumul at i ve ent r opy of t he uni que wor ds i n p ost p λ
                                                                                                             i ∈ [1, n ] p i ( l og λ − l og p i )
                                  of t ot al wor d l engt h n and pi t he fr equency of each wor d                            λ
 U p p er ca se co u n t :        N umb er of upp er case wor ds                                                      #
        R ead ab i l i t y :      G unni ng fog i ndex usi ng aver age sent ence l engt h ( A SL )                    [7]
                                  and t he p er cent age of com pl ex wor d s ( P C W ) .                    0.4( A SL + P C W )
           V er b C o u n t :     N umb er of ver bs                                                                  #
          N ou n C ou nt :        N umb er of nouns                                                                   #
  A d j ect i v e C ou n t :      N umb er of adj ect i ves                                                           #
     R ef er r al C o u n t :     N umb er of @user                                                                   #
  T i m e i n t h e d ay :        N or m al i sed t i m e i n t he day m easur ed i n m i nut es                      #
   I n f o r m at i v en ess:     T er m i nol ogi cal novel t y of t he p ost wr t ot her p ost s
                                  T he cumul at i ve t fI df val ue of each t er m t i n p ost p                   t∈p    t f i df ( t , p)
               P o l ar i t y :   Cumul at ion of p ol ar t er m weight s in p ( using
                                                                                                                       P o+ N e
                                  Sent i wor dnet 3 l ex i con) nor m al i sed by p ol ar t er m s count               | t er m s |




 4.2 Ex p er im ent s
Predicting Discussions on the Social Semantic Web                                                                                             10
 Experiment s are int ended t o t est t he performance of different classificat ion mod-
different motives which do not necessarily designat e a response t o t he initial
                                     Identifying Seed
message. T herefore, we only invest igat e explicit replies to messages. To gather
our discussions, and our seed post s, we it erat ively move up t he reply chain - i.e.,
                                     Posts
from reply t o parent post - until we reach t he seed post in t he discussion. We
define this process as dataset enrichment, and is performed by querying T wit t er’s
  • Experiments
REST API 6 using t he in reply to id of t he parent post, and moving one-st ep at
a t ime up the reply chain. T his same approach has been employed successfully
       – Haiti and Union Address Datasets
in work by [12] t o gather a large-scale conversation dat aset from T wit ter.
       – Divided each dataset up using 70/20/10 split for
         training/validation/testing s used for experiment s
             T abl e 2. St at ist ics of t he dat aset
        D at aset       U ser s    T weet s     Seeds       N on-Seeds     Repli es
          H ai t i      44,497     65,022       1,405         60,686        2,931
    U ni on A ddr ess   66,300     80,272       7,228         55,169       17,875


    TableEvaluated a binary classification task datasets. One can
       – 2 shows the statist ics that explain our collected
            • Is this post a seed post or not?
observe the difference in conversat ional tweet s between t he two corpora, where
t he Hait i dat aset cont ains fewer F1 and Areaa percent age t han t he Union
            • Precision, Recall, seed post s as under ROC
dataset, and therefore fewer replies. However, as we explain lat er a lat er sect ion,
            • Tested: user, content, user+content features
this does not correlate with a higher discussion volume in the former dat aset. We
convert t he collected dataset s from SVM, Naïve Bayesformat s int o triples,
       – Tested Perceptron, t heir propriet ary JSON and J48
annot ated using concepts from our above behaviour ont ology, this enables our
features t o be derived by querying our dat aset s using basic SPARQL queries.
 Predicting Discussions on the Social Semantic Web                             11
t he F1 score - and used t his model t oget her wit h t he best performing feat ures,
                                        Identifying Seed
using a ranking heurist ic, t o classify post s cont ained wit hin our t est split . We
first report on t he result s obt ained from our model select ion phase, before moving
                                        Posts
ont o our result s from using t he best model wit h t he t op-k feat ures.

T abl e 3. Result s from t he classificat ion of seed post s using varying feat ure set s and
classifi cat ion models
                (a) Hait i Dat aset                  (b) Union A ddress Dat aset
                       P     R    F1   ROC                      P      R    F1   ROC
       U ser   Per c 0.794 0.528 0.634 0.727    U ser   Per c 0.658 0.697 0.677 0.673
               SV M 0.843 0.159 0.267 0.566             SV M 0.510 0. 946 0.663 0.512
               NB    0.948 0.269 0.420 0.785            NB    0.844 0.086 0.157 0.707
               J48   0.906 0.679 0.776 0.822            J48   0.851 0.722 0.782 0.830
      Cont ent Per c 0.875 0.077 0.142 0.606   Cont ent Per c 0.467 0. 698 0.560 0.457
               SV M 0.552 0.727 0.627 0.589             SV M 0.650 0.589 0.618 0.638
               NB    0.721 0.638 0.677 0.769            NB    0.762 0.212 0.332 0.649
               J48   0.685 0.705 0.695 0.711            J48   0.740 0.533 0.619 0.736
        A ll   Per c 0.794 0.528 0.634 0.726     A ll   Per c 0.630 0.762 0.690 0.672
               SV M 0.483 0.996 0.651 0.502             SV M 0.499 0. 990 0.664 0.506
               NB    0.962 0.280 0.434 0.852            NB    0.874 0.212 0.341 0.737
               J48   0.824 0.775 0.798 0.836            J48   0.890 0.810 0.848 0.877


4.3     R esult s
Our findings from Table 3 demonst rat e t he effect iveness of using solely user
features for ident ifying seed post s. In bot h t he Hait i and Union Address12 aset s
      Predicting Discussions on the Social Semantic Web
                                                                             dat
Identifying Seed
which we found t o be 0.674 indicat ing a good correlat ion between t he two list s

                                                 Posts
and t heir respect ive ranks.

T abl e 4. Feat ures ranked by Informat ion Gain Rat io wrt Seed Post class label. T he
feat ure name is paired wittheit most bracket s.
         • What are hin s IG in important features?
          R ank   H ai t i                                  U ni on A ddr ess
            1     user -l i st -degr ee ( 0.275)            user -l i st -degr ee ( 0.319)
            2     user -i n-degr ee ( 0.221)                cont ent -t i m e-i n-day ( 0.152)
            3     cont ent -i nfor m at i veness ( 0.154)   user -i n-degr ee ( 0.133)
            4     user -num -p ost s ( 0.111)               user -num -p ost s ( 0.104)
            5     cont ent -t i m e-i n-day ( 0.089)        user -p ost -r at e ( 0.075)
            6     user -p ost -r at e ( 0.075)              user -out -degr ee ( 0.056)
            7     cont ent -p ol ar i t y ( 0.064)          cont ent -r efer r al -count ( 0.030)
            8     user -out -degr ee ( 0.040)               user -age ( 0.015)
            9     cont ent -r efer r al -count ( 0.038)     cont ent -p ol ar i t y ( 0.015)
           10     cont ent -l engt h ( 0.020)               cont ent -l engt h ( 0.010)
           11     cont ent -r eadabi l i t y ( 0.018)       cont ent -com pl exi t y ( 0.004)
           12     user -age ( 0.015)                        cont ent -noun-count ( 0.002)
           13     cont ent -upp er case-count ( 0.012)      cont ent -r eadabi l i t y ( 0.001)
           14     cont ent -noun-count ( 0.010)             cont ent -ver b-count ( 0.001)
           15     cont ent -ad j -count ( 0.005)            cont ent -ad j -count ( 0.0)
           16     cont ent -com pl exi t y ( 0.0)           cont ent -i nfor m at i veness ( 0.0)
           17     cont ent -ver b-count ( 0.0)              cont ent -upp er case-cou nt ( 0.0)



       Predicting Discussions on the Social Semantic Web                                            13
5     cont ent -t i m e-i n-day ( 0.089)      user -p ost -r at e ( 0.075)


                                                Identifying Seed
                 6     u ser -p ost -r at e ( 0.075)           user -out -degr ee ( 0.056)
                 7     cont ent -p ol ar i t y ( 0.064)        cont ent -r efer r al -count ( 0.030)
                 8     u ser -out -degr ee ( 0.040)            user -age ( 0.015)
                 9     cont ent -r efer r al -count ( 0.038)   cont ent -p ol ar i t y ( 0.015)


                                                Posts
                 10    cont ent -l engt h ( 0.020)             cont ent -l engt h ( 0.010)
                 11    cont ent -r ead ab i l i t y ( 0.018)   cont ent -com pl exi t y ( 0.004)
                 12    user -age ( 0.015)                      cont ent -noun-count ( 0.002)
                 13    cont ent -upp er case-count ( 0.012)    cont ent -r eadabi l i t y ( 0.001)
                 14    cont ent -noun-count ( 0.010)           cont ent -ver b-count ( 0.001)
• What is the correlation between seed posts
                 15
                 16
                       cont ent -ad j -count ( 0.005)
                       cont ent -com pl exi t y ( 0.0)
                                                               cont ent -ad j -count ( 0.0)
                                                               cont ent -i nfor m at i veness ( 0.0)

  and features?  17    cont ent -ver b-cou nt ( 0.0)           cont ent -upp er case-count ( 0.0)




Predicting Discussions onof t op-5 feat Semantic Web
     F i g. 3. Cont ribut ions the Social ures t o ident ifying Non-seeds (N ) and Seeds(S). 14
     Upper plot s are for t he Hait i dat aset and t he lower plot s are for t he Union A ddress
Identifying Seed
                                    Posts
• Can we identify seed posts using the top-k
  features?




Predicting Discussions on the Social Semantic Web      15
Predicting
                                    Discussion Activity
• From identified seed posts:
    – Can we predict the level of discussion activity?
    – How much activity will a post generate?


• [Wang & Groth, 2010] learns a regression
  model, and reports on coefficients
    – Identifying relationship between features


• We do something different:
    – Predict the volume of the discussion


Predicting Discussions on the Social Semantic Web        16
Predicting
                                    Discussion Activity




Predicting Discussions on the Social Semantic Web    17
feat ures. Using the predict ed values for each post we can then form a ranking,

                                            Predicting
which is comparable to a ground t rut h ranking wit hin our dat a. T his provides
discussion activity levels, where posts are ordered by t heir expected volume. T his
                                            Discussion Activity
approach also enables context ual predict ions where a post can be compared wit h
existing posts t hat have produced lengthy debates.
      • Compare rankings
5.1     ExpGround s
         – er im ent truth vs predicted
D at• Experiments
     aset s For our experiment s we used t he same dat aset s as in t he previous
section: – Using Haiti from the Haiti Address tdatasets
         tweets collected and Union crisis and he Union Address speech. We
maintain t he same split s as before - t raining/ validat ion/ t est ing wit h a 70/ 20/ 10
         – Evaluation measure: Normalised Discounted
split - but without using t he t est split . Inst ead we t rain t he regression models
           Cumulative Gain
using t he seed post s in t he t raining split and t hen test t he predict ion accuracy
using the seed post s in the validat ion split k={1,5,10,20,50,100)
             • Assessing nDCG@k where - seed post s in the validation set
are identified using Support Vectorrained using botwith: content feat ures.
         – Tested the J48 classifier t Regression h user+
Table 5 describes our dat aset s user+content features
             • user, content, for clarification.
               T abl e 5. Discussion volumes and dist ribut ions of our dat aset s
          D at aset       T r ai n Si ze   T est Si ze    T est Vol M ean   T est Vol SD
           H ai t i            980            210               1.664           3.017
      U nion A ddr ess        5,067          1,161              1.761           2.342


      Predicting Discussions on the Social Semantic Web                                18
Predicting
                                                     Discussion Activity
features. Alt hough different coefficient s are learnt for each dat aset, for major
feat ures with greater weights the signs remain the same. In the case of the list-
degree of t he user, which yielded high IGR during classificat ion, t here is a similar
posit ive associat ion with the discussion volume - t he same is also t rue for t he
in-degree and t he out -degree of t he user. T his indicates t hat a const ant increase
in the combinat ion of a user’s in-degree, out-degree, and list-degree will lead t o
increased discussion volumes. Out -degree plays an import ant role by enabling
t he seed post aut hor t o see post ed responses - given t hat the larger t he out -degree
the greater the reception of informat ion from ot her users.
T abl e 6. Coefficient s of user feat ures learnt using Support Vect or Regression over t he
two Dat aset s. Coefficient s are rounded t o 4 dp.

           user -num -p ost s user -out -degr ee user -in-degr ee user -l ist -degr ee user -age user -p ost -r at e
H ai t i        -0.0019            + 0.001          + 0.0016         + 0.0046          + 0.0001     + 0.0001
U ni on         -0.0025           + 0.0114          + 0.0025         + 0.0154          -0.0003      -0.0002




6      Conclusions on the Social Semantic Web
       Predicting Discussions                                                                            19
Findings
• User reputation and standing is crucial
    – eliciting a response
    – starting a discussion
• Greater broadcast capability = greater
  likelihood of response
    – More listeners = more discussion
• Activity levels influenced by out-degree
    – Allow the poster to see response from
      ‘respected’ peers




Predicting Discussions on the Social Semantic Web   20
Conclusions
• Pre-empt discussions to empower
    – Market analysts
    – Opinion mining
    – eGovernment policy makers
• Behaviour ontology
    – Captures impact across platforms
• Approach accurately predicts:
    – Which posts will yield a reply, and;
    – The level of discussion activity




Predicting Discussions on the Social Semantic Web   21
Current and Future
                                    Work
• Experiments over a forum dataset
    – Content features >> user features
    – Different platform dynamics
• Extend experiments to a random Twitter
  dataset
• Extension to behaviour ontology
    – Captures concentration
    – i.e. focus of a user on specific topics
• Categorising users by role
    – Based on observed behaviour


Predicting Discussions on the Social Semantic Web   22
Questions?
people.kmi.open.ac.uk/rowe
m.c.rowe@open.ac.uk
@mattroweshow

QUESTIONS

Predicting Discussions on the Social Semantic Web   23

Weitere ähnliche Inhalte

Ähnlich wie Eswc2011 socialweb-discussionpredictions

Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTubeWebometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTubeHan Woo PARK
 
Twitter Data Analysis
Twitter Data Analysis Twitter Data Analysis
Twitter Data Analysis Manan Gadhiya
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social MediaDr Wasim Ahmed
 
שיווק בפייסבוק - איך זה עובד?
שיווק בפייסבוק - איך זה עובד?שיווק בפייסבוק - איך זה עובד?
שיווק בפייסבוק - איך זה עובד?Refresh
 
SHORTer VERSION - Liminality and Communitas in Social Media - The case of Twi...
SHORTer VERSION - Liminality and Communitas in Social Media - The case of Twi...SHORTer VERSION - Liminality and Communitas in Social Media - The case of Twi...
SHORTer VERSION - Liminality and Communitas in Social Media - The case of Twi...Jana Herwig
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportPatrick Grant
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysisAntaraBhattacharya12
 
DH 199 Social Media Analytics
DH 199 Social Media AnalyticsDH 199 Social Media Analytics
DH 199 Social Media AnalyticsStephanie Wong
 
UX STRAT USA 2019: Rina Tambo Jensen, Mozilla
UX STRAT USA 2019: Rina Tambo Jensen, Mozilla UX STRAT USA 2019: Rina Tambo Jensen, Mozilla
UX STRAT USA 2019: Rina Tambo Jensen, Mozilla UX STRAT
 
Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2The Night's Watch
 
Liminality and Communitas in Social Media: The Case of Twitter
Liminality and Communitas in Social Media: The Case of TwitterLiminality and Communitas in Social Media: The Case of Twitter
Liminality and Communitas in Social Media: The Case of TwitterJana Herwig
 
Virtual collaboration
Virtual collaborationVirtual collaboration
Virtual collaborationJohnny Ryser
 
Knowledge Media - SUSE
Knowledge Media - SUSEKnowledge Media - SUSE
Knowledge Media - SUSEIriss
 
Cloudcomputing- Chris Francis- Digibiz'09
Cloudcomputing- Chris Francis- Digibiz'09Cloudcomputing- Chris Francis- Digibiz'09
Cloudcomputing- Chris Francis- Digibiz'09Digibiz'09 Conference
 
2022 03 10 critical reflexion
2022 03 10 critical reflexion2022 03 10 critical reflexion
2022 03 10 critical reflexionTheodoreMortchev
 
104 elluminate participant-orientation_slides
104 elluminate participant-orientation_slides104 elluminate participant-orientation_slides
104 elluminate participant-orientation_slidescsgosse
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysisSubhas Kumar Ghosh
 

Ähnlich wie Eswc2011 socialweb-discussionpredictions (20)

Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTubeWebometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTube
 
Twitter Data Analysis
Twitter Data Analysis Twitter Data Analysis
Twitter Data Analysis
 
Final deck
Final deckFinal deck
Final deck
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social Media
 
Digital Marketing Portfolio
Digital Marketing PortfolioDigital Marketing Portfolio
Digital Marketing Portfolio
 
שיווק בפייסבוק - איך זה עובד?
שיווק בפייסבוק - איך זה עובד?שיווק בפייסבוק - איך זה עובד?
שיווק בפייסבוק - איך זה עובד?
 
SHORTer VERSION - Liminality and Communitas in Social Media - The case of Twi...
SHORTer VERSION - Liminality and Communitas in Social Media - The case of Twi...SHORTer VERSION - Liminality and Communitas in Social Media - The case of Twi...
SHORTer VERSION - Liminality and Communitas in Social Media - The case of Twi...
 
3 schools | 1 day
3 schools | 1 day3 schools | 1 day
3 schools | 1 day
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper Report
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
 
DH 199 Social Media Analytics
DH 199 Social Media AnalyticsDH 199 Social Media Analytics
DH 199 Social Media Analytics
 
UX STRAT USA 2019: Rina Tambo Jensen, Mozilla
UX STRAT USA 2019: Rina Tambo Jensen, Mozilla UX STRAT USA 2019: Rina Tambo Jensen, Mozilla
UX STRAT USA 2019: Rina Tambo Jensen, Mozilla
 
Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2
 
Liminality and Communitas in Social Media: The Case of Twitter
Liminality and Communitas in Social Media: The Case of TwitterLiminality and Communitas in Social Media: The Case of Twitter
Liminality and Communitas in Social Media: The Case of Twitter
 
Virtual collaboration
Virtual collaborationVirtual collaboration
Virtual collaboration
 
Knowledge Media - SUSE
Knowledge Media - SUSEKnowledge Media - SUSE
Knowledge Media - SUSE
 
Cloudcomputing- Chris Francis- Digibiz'09
Cloudcomputing- Chris Francis- Digibiz'09Cloudcomputing- Chris Francis- Digibiz'09
Cloudcomputing- Chris Francis- Digibiz'09
 
2022 03 10 critical reflexion
2022 03 10 critical reflexion2022 03 10 critical reflexion
2022 03 10 critical reflexion
 
104 elluminate participant-orientation_slides
104 elluminate participant-orientation_slides104 elluminate participant-orientation_slides
104 elluminate participant-orientation_slides
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 

Mehr von WeGov project

UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)
UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)
UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)WeGov project
 
2nd WeGov Workshop Agenda
2nd WeGov Workshop Agenda2nd WeGov Workshop Agenda
2nd WeGov Workshop AgendaWeGov project
 
WeGov updated generic presentation
WeGov updated generic presentationWeGov updated generic presentation
WeGov updated generic presentationWeGov project
 
Final WeGov Flyer (German edition)
Final WeGov Flyer (German edition)Final WeGov Flyer (German edition)
Final WeGov Flyer (German edition)WeGov project
 
Final WeGov Poster (Updated!)
Final WeGov Poster (Updated!)Final WeGov Poster (Updated!)
Final WeGov Poster (Updated!)WeGov project
 
Final WeGov Flyer (Updated!)
 Final WeGov Flyer (Updated!) Final WeGov Flyer (Updated!)
Final WeGov Flyer (Updated!)WeGov project
 
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)WeGov project
 
WeGov Bundestag Interview Highlights 2011 @Co:llaboratory
WeGov Bundestag Interview Highlights 2011 @Co:llaboratoryWeGov Bundestag Interview Highlights 2011 @Co:llaboratory
WeGov Bundestag Interview Highlights 2011 @Co:llaboratoryWeGov project
 
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)WeGov project
 
WeGov Software Präsentation (Prototyp 2.5) im Bundestag
WeGov Software Präsentation (Prototyp 2.5) im BundestagWeGov Software Präsentation (Prototyp 2.5) im Bundestag
WeGov Software Präsentation (Prototyp 2.5) im BundestagWeGov project
 
E-challenges11 WeGov Workshop
E-challenges11 WeGov WorkshopE-challenges11 WeGov Workshop
E-challenges11 WeGov WorkshopWeGov project
 
WeGov Overview Progress Month 20 @ IFIP e-government conference 2011
WeGov Overview Progress Month 20 @ IFIP e-government conference 2011WeGov Overview Progress Month 20 @ IFIP e-government conference 2011
WeGov Overview Progress Month 20 @ IFIP e-government conference 2011WeGov project
 
Public Politician Profiles on Facebook and the Gap of Authenticity
Public Politician Profiles on Facebook and the Gap of AuthenticityPublic Politician Profiles on Facebook and the Gap of Authenticity
Public Politician Profiles on Facebook and the Gap of AuthenticityWeGov project
 
Projektvorstellung WeGov - Deutscher Bundestag
Projektvorstellung WeGov - Deutscher BundestagProjektvorstellung WeGov - Deutscher Bundestag
Projektvorstellung WeGov - Deutscher BundestagWeGov project
 
Digital Monitoring of societal Discussions in online social Networks
Digital Monitoring of societal Discussions in online social NetworksDigital Monitoring of societal Discussions in online social Networks
Digital Monitoring of societal Discussions in online social NetworksWeGov project
 
Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionWeGov project
 
1st WeGov Workshop Agenda
1st WeGov Workshop Agenda1st WeGov Workshop Agenda
1st WeGov Workshop AgendaWeGov project
 
1st WeGov Workshop description
1st WeGov Workshop description1st WeGov Workshop description
1st WeGov Workshop descriptionWeGov project
 
1st WeGov Workshop within eChallenges e-2011, in October 2011
1st WeGov Workshop within eChallenges e-2011, in October 20111st WeGov Workshop within eChallenges e-2011, in October 2011
1st WeGov Workshop within eChallenges e-2011, in October 2011WeGov project
 

Mehr von WeGov project (20)

UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)
UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)
UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)
 
2nd WeGov Workshop Agenda
2nd WeGov Workshop Agenda2nd WeGov Workshop Agenda
2nd WeGov Workshop Agenda
 
WeGov updated generic presentation
WeGov updated generic presentationWeGov updated generic presentation
WeGov updated generic presentation
 
Final WeGov Flyer (German edition)
Final WeGov Flyer (German edition)Final WeGov Flyer (German edition)
Final WeGov Flyer (German edition)
 
WeGov Brochure (New!)
WeGov Brochure (New!)WeGov Brochure (New!)
WeGov Brochure (New!)
 
Final WeGov Poster (Updated!)
Final WeGov Poster (Updated!)Final WeGov Poster (Updated!)
Final WeGov Poster (Updated!)
 
Final WeGov Flyer (Updated!)
 Final WeGov Flyer (Updated!) Final WeGov Flyer (Updated!)
Final WeGov Flyer (Updated!)
 
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
 
WeGov Bundestag Interview Highlights 2011 @Co:llaboratory
WeGov Bundestag Interview Highlights 2011 @Co:llaboratoryWeGov Bundestag Interview Highlights 2011 @Co:llaboratory
WeGov Bundestag Interview Highlights 2011 @Co:llaboratory
 
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)
 
WeGov Software Präsentation (Prototyp 2.5) im Bundestag
WeGov Software Präsentation (Prototyp 2.5) im BundestagWeGov Software Präsentation (Prototyp 2.5) im Bundestag
WeGov Software Präsentation (Prototyp 2.5) im Bundestag
 
E-challenges11 WeGov Workshop
E-challenges11 WeGov WorkshopE-challenges11 WeGov Workshop
E-challenges11 WeGov Workshop
 
WeGov Overview Progress Month 20 @ IFIP e-government conference 2011
WeGov Overview Progress Month 20 @ IFIP e-government conference 2011WeGov Overview Progress Month 20 @ IFIP e-government conference 2011
WeGov Overview Progress Month 20 @ IFIP e-government conference 2011
 
Public Politician Profiles on Facebook and the Gap of Authenticity
Public Politician Profiles on Facebook and the Gap of AuthenticityPublic Politician Profiles on Facebook and the Gap of Authenticity
Public Politician Profiles on Facebook and the Gap of Authenticity
 
Projektvorstellung WeGov - Deutscher Bundestag
Projektvorstellung WeGov - Deutscher BundestagProjektvorstellung WeGov - Deutscher Bundestag
Projektvorstellung WeGov - Deutscher Bundestag
 
Digital Monitoring of societal Discussions in online social Networks
Digital Monitoring of societal Discussions in online social NetworksDigital Monitoring of societal Discussions in online social Networks
Digital Monitoring of societal Discussions in online social Networks
 
Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivityprediction
 
1st WeGov Workshop Agenda
1st WeGov Workshop Agenda1st WeGov Workshop Agenda
1st WeGov Workshop Agenda
 
1st WeGov Workshop description
1st WeGov Workshop description1st WeGov Workshop description
1st WeGov Workshop description
 
1st WeGov Workshop within eChallenges e-2011, in October 2011
1st WeGov Workshop within eChallenges e-2011, in October 20111st WeGov Workshop within eChallenges e-2011, in October 2011
1st WeGov Workshop within eChallenges e-2011, in October 2011
 

Kürzlich hochgeladen

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 

Kürzlich hochgeladen (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 

Eswc2011 socialweb-discussionpredictions

  • 1. Predicting Discussions on the Social Semantic Web Matthew Rowe, Sofia Angeletou and Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom
  • 2. Mass of Social Data Social content is now published at a staggering rate…. Predicting Discussions on the Social Semantic Web 1
  • 3. Social Data Publication Rates • ~600 Tweets per second [1] • ~700 Facebook status updates per second [1] • Spinn3r dataset collected from Jan – Feb 2011 [2] – 133 million blog posts – 5.7 million forum posts – 231 million social media posts [1] http://searchengineland.com/by-the-numbers-twitter-vs-facebook-vs-google-buzz-36709 [2] http://icwsm.org/data/index.php Predicting Discussions on the Social Semantic Web 2
  • 4. The New Information Era Predicting Discussions on the Social Semantic Web 3
  • 5. But…. Analysis is Limited • Market Analysts – What are people saying about my products? • Opinion Mining – How are people perceiving a given subject or topic? • eGovernment Policy Makers – How is a policy or law received by the public? – How can I maximise feedback to my content? Predicting Discussions on the Social Semantic Web 4
  • 6. Attention Economics • Given all this data… How do we decide on what information to focus on? How do we know what posts will evolve into discussions? • Attention Economics (Goldhaber, 1997) • Need to understand key indicators of high- attention discussions Predicting Discussions on the Social Semantic Web 5
  • 7. Discussions on Twitter • Twitter is used as medium to: – Share opinions and ideas – Engage in discussions • Discussing events • Debating topics • Identifying online discussions enables: – Up-to-date public opinion – Observation of topics of interest – Gauging the popularity of government policies – Fine-grained customer support Predicting Discussions on the Social Semantic Web 6
  • 8. Predicting Discussions • Pre-empt discussions on the Social Web: 1. Identifying seed posts • i.e. posts that start a discussion • Will a given post start a discussion? • What are the key features of seed posts? 2. Predicting discussion activity levels • What is the level of discussion that a seed post will generate? • What are the key factors of lengthy discussions? Predicting Discussions on the Social Semantic Web 7
  • 9. The Need for Semantics • For predictions we require statistical features – User features – Content features • Features provided using differing schemas by different platforms – How to overcome heterogeneity? • Currently, no ontologies capture such features Predicting Discussions on the Social Semantic Web 8
  • 10. Behaviour Ontology www.purl.org/NET/oubo/0.23/ Predicting Discussions on the Social Semantic Web 9
  • 11. T he likelihood of post s elicit ing replies depends upon popularity, a highly subjec- t ive t erm influenced by ext ernal fact ors. Propert ies influencing popularity include Features user at t ribut es - describing t he reput at ion of t he user - and at t ribut es of a post ’s cont ent - generally referred t o as cont ent feat ures. In Table 1 we define user and cont ent feat ures and st udy t heir influence on t he discussion “ cont inuat ion” . T abl e 1. User and Cont ent Feat ures U ser Feat ur es I n D eg r ee: N umb er of fol l ower s of U # O u t D eg r ee: N umb er of user s U fol l ows # L i st D eg r ee: N umb er of l i st s U app ear s on. L i st s gr ou p user s by t op i c # P o st C o u n t : T ot al numb er of p ost s t he user has ever p ost ed # U ser A g e: N umb er of m i nut es fr om user j oi n dat e # P ost C ou n t P o st R at e: Post i ng fr equency of t he user U s er A g e Cont ent Feat ur es P o st l en g t h : L engt h of t he p ost i n char act er s # C o m p l ex i t y : Cumul at i ve ent r opy of t he uni que wor ds i n p ost p λ i ∈ [1, n ] p i ( l og λ − l og p i ) of t ot al wor d l engt h n and pi t he fr equency of each wor d λ U p p er ca se co u n t : N umb er of upp er case wor ds # R ead ab i l i t y : G unni ng fog i ndex usi ng aver age sent ence l engt h ( A SL ) [7] and t he p er cent age of com pl ex wor d s ( P C W ) . 0.4( A SL + P C W ) V er b C o u n t : N umb er of ver bs # N ou n C ou nt : N umb er of nouns # A d j ect i v e C ou n t : N umb er of adj ect i ves # R ef er r al C o u n t : N umb er of @user # T i m e i n t h e d ay : N or m al i sed t i m e i n t he day m easur ed i n m i nut es # I n f o r m at i v en ess: T er m i nol ogi cal novel t y of t he p ost wr t ot her p ost s T he cumul at i ve t fI df val ue of each t er m t i n p ost p t∈p t f i df ( t , p) P o l ar i t y : Cumul at ion of p ol ar t er m weight s in p ( using P o+ N e Sent i wor dnet 3 l ex i con) nor m al i sed by p ol ar t er m s count | t er m s | 4.2 Ex p er im ent s Predicting Discussions on the Social Semantic Web 10 Experiment s are int ended t o t est t he performance of different classificat ion mod-
  • 12. different motives which do not necessarily designat e a response t o t he initial Identifying Seed message. T herefore, we only invest igat e explicit replies to messages. To gather our discussions, and our seed post s, we it erat ively move up t he reply chain - i.e., Posts from reply t o parent post - until we reach t he seed post in t he discussion. We define this process as dataset enrichment, and is performed by querying T wit t er’s • Experiments REST API 6 using t he in reply to id of t he parent post, and moving one-st ep at a t ime up the reply chain. T his same approach has been employed successfully – Haiti and Union Address Datasets in work by [12] t o gather a large-scale conversation dat aset from T wit ter. – Divided each dataset up using 70/20/10 split for training/validation/testing s used for experiment s T abl e 2. St at ist ics of t he dat aset D at aset U ser s T weet s Seeds N on-Seeds Repli es H ai t i 44,497 65,022 1,405 60,686 2,931 U ni on A ddr ess 66,300 80,272 7,228 55,169 17,875 TableEvaluated a binary classification task datasets. One can – 2 shows the statist ics that explain our collected • Is this post a seed post or not? observe the difference in conversat ional tweet s between t he two corpora, where t he Hait i dat aset cont ains fewer F1 and Areaa percent age t han t he Union • Precision, Recall, seed post s as under ROC dataset, and therefore fewer replies. However, as we explain lat er a lat er sect ion, • Tested: user, content, user+content features this does not correlate with a higher discussion volume in the former dat aset. We convert t he collected dataset s from SVM, Naïve Bayesformat s int o triples, – Tested Perceptron, t heir propriet ary JSON and J48 annot ated using concepts from our above behaviour ont ology, this enables our features t o be derived by querying our dat aset s using basic SPARQL queries. Predicting Discussions on the Social Semantic Web 11
  • 13. t he F1 score - and used t his model t oget her wit h t he best performing feat ures, Identifying Seed using a ranking heurist ic, t o classify post s cont ained wit hin our t est split . We first report on t he result s obt ained from our model select ion phase, before moving Posts ont o our result s from using t he best model wit h t he t op-k feat ures. T abl e 3. Result s from t he classificat ion of seed post s using varying feat ure set s and classifi cat ion models (a) Hait i Dat aset (b) Union A ddress Dat aset P R F1 ROC P R F1 ROC U ser Per c 0.794 0.528 0.634 0.727 U ser Per c 0.658 0.697 0.677 0.673 SV M 0.843 0.159 0.267 0.566 SV M 0.510 0. 946 0.663 0.512 NB 0.948 0.269 0.420 0.785 NB 0.844 0.086 0.157 0.707 J48 0.906 0.679 0.776 0.822 J48 0.851 0.722 0.782 0.830 Cont ent Per c 0.875 0.077 0.142 0.606 Cont ent Per c 0.467 0. 698 0.560 0.457 SV M 0.552 0.727 0.627 0.589 SV M 0.650 0.589 0.618 0.638 NB 0.721 0.638 0.677 0.769 NB 0.762 0.212 0.332 0.649 J48 0.685 0.705 0.695 0.711 J48 0.740 0.533 0.619 0.736 A ll Per c 0.794 0.528 0.634 0.726 A ll Per c 0.630 0.762 0.690 0.672 SV M 0.483 0.996 0.651 0.502 SV M 0.499 0. 990 0.664 0.506 NB 0.962 0.280 0.434 0.852 NB 0.874 0.212 0.341 0.737 J48 0.824 0.775 0.798 0.836 J48 0.890 0.810 0.848 0.877 4.3 R esult s Our findings from Table 3 demonst rat e t he effect iveness of using solely user features for ident ifying seed post s. In bot h t he Hait i and Union Address12 aset s Predicting Discussions on the Social Semantic Web dat
  • 14. Identifying Seed which we found t o be 0.674 indicat ing a good correlat ion between t he two list s Posts and t heir respect ive ranks. T abl e 4. Feat ures ranked by Informat ion Gain Rat io wrt Seed Post class label. T he feat ure name is paired wittheit most bracket s. • What are hin s IG in important features? R ank H ai t i U ni on A ddr ess 1 user -l i st -degr ee ( 0.275) user -l i st -degr ee ( 0.319) 2 user -i n-degr ee ( 0.221) cont ent -t i m e-i n-day ( 0.152) 3 cont ent -i nfor m at i veness ( 0.154) user -i n-degr ee ( 0.133) 4 user -num -p ost s ( 0.111) user -num -p ost s ( 0.104) 5 cont ent -t i m e-i n-day ( 0.089) user -p ost -r at e ( 0.075) 6 user -p ost -r at e ( 0.075) user -out -degr ee ( 0.056) 7 cont ent -p ol ar i t y ( 0.064) cont ent -r efer r al -count ( 0.030) 8 user -out -degr ee ( 0.040) user -age ( 0.015) 9 cont ent -r efer r al -count ( 0.038) cont ent -p ol ar i t y ( 0.015) 10 cont ent -l engt h ( 0.020) cont ent -l engt h ( 0.010) 11 cont ent -r eadabi l i t y ( 0.018) cont ent -com pl exi t y ( 0.004) 12 user -age ( 0.015) cont ent -noun-count ( 0.002) 13 cont ent -upp er case-count ( 0.012) cont ent -r eadabi l i t y ( 0.001) 14 cont ent -noun-count ( 0.010) cont ent -ver b-count ( 0.001) 15 cont ent -ad j -count ( 0.005) cont ent -ad j -count ( 0.0) 16 cont ent -com pl exi t y ( 0.0) cont ent -i nfor m at i veness ( 0.0) 17 cont ent -ver b-count ( 0.0) cont ent -upp er case-cou nt ( 0.0) Predicting Discussions on the Social Semantic Web 13
  • 15. 5 cont ent -t i m e-i n-day ( 0.089) user -p ost -r at e ( 0.075) Identifying Seed 6 u ser -p ost -r at e ( 0.075) user -out -degr ee ( 0.056) 7 cont ent -p ol ar i t y ( 0.064) cont ent -r efer r al -count ( 0.030) 8 u ser -out -degr ee ( 0.040) user -age ( 0.015) 9 cont ent -r efer r al -count ( 0.038) cont ent -p ol ar i t y ( 0.015) Posts 10 cont ent -l engt h ( 0.020) cont ent -l engt h ( 0.010) 11 cont ent -r ead ab i l i t y ( 0.018) cont ent -com pl exi t y ( 0.004) 12 user -age ( 0.015) cont ent -noun-count ( 0.002) 13 cont ent -upp er case-count ( 0.012) cont ent -r eadabi l i t y ( 0.001) 14 cont ent -noun-count ( 0.010) cont ent -ver b-count ( 0.001) • What is the correlation between seed posts 15 16 cont ent -ad j -count ( 0.005) cont ent -com pl exi t y ( 0.0) cont ent -ad j -count ( 0.0) cont ent -i nfor m at i veness ( 0.0) and features? 17 cont ent -ver b-cou nt ( 0.0) cont ent -upp er case-count ( 0.0) Predicting Discussions onof t op-5 feat Semantic Web F i g. 3. Cont ribut ions the Social ures t o ident ifying Non-seeds (N ) and Seeds(S). 14 Upper plot s are for t he Hait i dat aset and t he lower plot s are for t he Union A ddress
  • 16. Identifying Seed Posts • Can we identify seed posts using the top-k features? Predicting Discussions on the Social Semantic Web 15
  • 17. Predicting Discussion Activity • From identified seed posts: – Can we predict the level of discussion activity? – How much activity will a post generate? • [Wang & Groth, 2010] learns a regression model, and reports on coefficients – Identifying relationship between features • We do something different: – Predict the volume of the discussion Predicting Discussions on the Social Semantic Web 16
  • 18. Predicting Discussion Activity Predicting Discussions on the Social Semantic Web 17
  • 19. feat ures. Using the predict ed values for each post we can then form a ranking, Predicting which is comparable to a ground t rut h ranking wit hin our dat a. T his provides discussion activity levels, where posts are ordered by t heir expected volume. T his Discussion Activity approach also enables context ual predict ions where a post can be compared wit h existing posts t hat have produced lengthy debates. • Compare rankings 5.1 ExpGround s – er im ent truth vs predicted D at• Experiments aset s For our experiment s we used t he same dat aset s as in t he previous section: – Using Haiti from the Haiti Address tdatasets tweets collected and Union crisis and he Union Address speech. We maintain t he same split s as before - t raining/ validat ion/ t est ing wit h a 70/ 20/ 10 – Evaluation measure: Normalised Discounted split - but without using t he t est split . Inst ead we t rain t he regression models Cumulative Gain using t he seed post s in t he t raining split and t hen test t he predict ion accuracy using the seed post s in the validat ion split k={1,5,10,20,50,100) • Assessing nDCG@k where - seed post s in the validation set are identified using Support Vectorrained using botwith: content feat ures. – Tested the J48 classifier t Regression h user+ Table 5 describes our dat aset s user+content features • user, content, for clarification. T abl e 5. Discussion volumes and dist ribut ions of our dat aset s D at aset T r ai n Si ze T est Si ze T est Vol M ean T est Vol SD H ai t i 980 210 1.664 3.017 U nion A ddr ess 5,067 1,161 1.761 2.342 Predicting Discussions on the Social Semantic Web 18
  • 20. Predicting Discussion Activity features. Alt hough different coefficient s are learnt for each dat aset, for major feat ures with greater weights the signs remain the same. In the case of the list- degree of t he user, which yielded high IGR during classificat ion, t here is a similar posit ive associat ion with the discussion volume - t he same is also t rue for t he in-degree and t he out -degree of t he user. T his indicates t hat a const ant increase in the combinat ion of a user’s in-degree, out-degree, and list-degree will lead t o increased discussion volumes. Out -degree plays an import ant role by enabling t he seed post aut hor t o see post ed responses - given t hat the larger t he out -degree the greater the reception of informat ion from ot her users. T abl e 6. Coefficient s of user feat ures learnt using Support Vect or Regression over t he two Dat aset s. Coefficient s are rounded t o 4 dp. user -num -p ost s user -out -degr ee user -in-degr ee user -l ist -degr ee user -age user -p ost -r at e H ai t i -0.0019 + 0.001 + 0.0016 + 0.0046 + 0.0001 + 0.0001 U ni on -0.0025 + 0.0114 + 0.0025 + 0.0154 -0.0003 -0.0002 6 Conclusions on the Social Semantic Web Predicting Discussions 19
  • 21. Findings • User reputation and standing is crucial – eliciting a response – starting a discussion • Greater broadcast capability = greater likelihood of response – More listeners = more discussion • Activity levels influenced by out-degree – Allow the poster to see response from ‘respected’ peers Predicting Discussions on the Social Semantic Web 20
  • 22. Conclusions • Pre-empt discussions to empower – Market analysts – Opinion mining – eGovernment policy makers • Behaviour ontology – Captures impact across platforms • Approach accurately predicts: – Which posts will yield a reply, and; – The level of discussion activity Predicting Discussions on the Social Semantic Web 21
  • 23. Current and Future Work • Experiments over a forum dataset – Content features >> user features – Different platform dynamics • Extend experiments to a random Twitter dataset • Extension to behaviour ontology – Captures concentration – i.e. focus of a user on specific topics • Categorising users by role – Based on observed behaviour Predicting Discussions on the Social Semantic Web 22

Hinweis der Redaktion

  1. Web Science talk!!!
  2. Provision of a range of applicationsLarge-scale uptake of smart phones
  3. How can I track discussions?
  4. What features are correlated with discussions?
  5. User features more important than content featuresSignificant at alpha = 0.01User reputation therefore more important than the quality of content!
  6. Over training split!Pearsoncorrelation: 0.674 = Fair agreement
  7. Ran experiments using j48 with all features, testing on the test split
  8. Fitted Kolmogorov-Smirnov goodness of fit test
  9. For haitiContent features more important than user features at higher ranksUser features show greater performance as ranks are increasedUnion:User features outperform content for all ranks apart from topIndicates the differences in the dynamics of the data
  10. Attention Economics!!