SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Ignorance isn't Bliss: An Empirical Analysis of
      Attention Patterns in Online Communities
Claudia Wagner, Matthew Rowe, Markus Strohmaier and Harith Alani
                                          Amsterdam, 16.4.2012
with…




Matthew Rowe

               Markus Strohmaier




                                   Harith Alani
3
                                             Motivation




Which factors impact how much attention a post gets?

We use the number of replies as a proxy measurment of attention
Research Questions


Which factors impact the attention level a post
gets in certain community forums?


How do these factors differ between individual
community forums?
5
                                         Methodology
    Empirical study of attention patterns in 20
    randomly selected forums
    Two-stage approach
      Differentiate between threadstarter posts that got at
      least one reply (seed posts) and threadstarter posts
      which got no replies at all (non-seed posts)
      Predict the level of attention that seed posts will
      generate - i.e. the number of replies
Dataset
Most popular Irish Message Boards, Boards.ie
725 Forums
Year 2005 and 2006
7
Feature Engineering
Aim
  Identify the features that impact upon seeding a
  discussion
  Identify features associated with seed posts that
  generate the most attention


Five Feature Groups
Five Feature Groups
User Features
  user account age, post count, in-degree, out-degree, post rate
Content Features
  post length, complexity, readability, link count, time in day,
  informativeness, polarity
Title Features
  Length, question marks, linguistic dimensions (LIWC)
Focus Features
  Forum entropy, forum likelihood, topic entropy, topic likelihood, topic
  distance
Community Features
  Topical community fit, topical community distance, evolution score,
  inequity score
Feature Computation
For each threadstarter post published in one of the
20 randomly selected forums in 2006 we
computed our 28 features

                                          m1




                                6 month
  Fit LDA model with
  standard parameter
  T=50, beta=0.01, alpha=50/T
Seed Post Identification
11
                                      Experiment
     Identify Posts which got replies (Binary
     Classification Task)
       Split data of each forum into train and test data
       (80/20)
       Train a logistic regression classifier with each feature
       group in isolation and all features combined
       Compare performance by using F1 score and the
       Matthews correlation coefficient (MCC)
Seed Post Identification
12
                                                Results
For these 9 forums our classifiers outperforms the random baseline:




Astronomy & Space: a classifier trained with content features alone
performs best
Spanish: a classifier trained with title features alone performs best
Seed Post Identification
13
                                  Feature Impact
     Analyze impact of individual features rather than
     groups
       Interpret statistically significant coefficients of the best
       performing feature group learned by the logistic
       regression model
       Rank the features of the best performing feature
       group using the Information Gain Ratio (IGR) as a
       ranking criterion
Seed Post Identification
14
                                  Observations
     In Spanish community the title length is the most
     important features (IGR=0.558, coef=-0.326)
     Posts with long titles are less likely to get replies
     In the Bank & Insurance forum short but complex
     posts which are authored by newbies are most
     likely to get replies
       Content length coef=-0.017, p< 0.05
       Topic distance coef=2.890, p<0.01
       Complexity has highest IGR (IGR=0.354)
Seed Post Identification
15
                                  Observations
     Number of links has a negative impact in forum
     Work & Jobs and Golf, but a positive impact in the
     Astronomy & Space forum


     Purpose of community
       Links have a positive impact in content and
       information driven communities
       Links have a negative impact in other communities
Seed Post Identification
16
                                 Observations
     Some communities require posts to fit to the topics
     they usually discuss (e.g., Golf) while others are
     more open to diverse topics (e.g., Work & Jobs)


     Specificity of community’s subject
       Subject of Work &Jobs forum is very general  high
       topical community distance has a positive impact
       Subject of Golf forum is very specific  high
       community distance has a negative impact
Activity Level Prediction
17
                                     Experiment
     Identify the features that were correlated with
     lengthy discussions
     Rank posts according to their attention level
     Evaluate our predicted rank using normalized
     Discounted Cumulative Gain (nDCG) at varying
     rank positions i.e. top-k where k={1, 5, 10, 20, 50,
     100}
     nDCG = DCG of the predicted ranking divided by
     DCG the actual rank
Activity Level Prediction
18
                                                       Results
        Aver




     AVERAGED NORMALISED DISCOUNTED CUMULATIVE GAIN
     A value of 1 indicates that the predicted ranking of posts perfectly matched their
     real ranking.
Activity Level Prediction
19
                                           Results
     Aver




For the Astronomy & Space community content features were best
for identifying seed posts and are also best for ranking posts
according to the attention level they will generate.
Activity Level Prediction
20
                                             Results
     Aver




Golf forum (343)
Combination of all features worked best for identifying seed posts.
Focus features alone are best for ranking posts.
Activity Level Prediction
21
                                             Results
     Aver




Bank & Insurance forum (544)
Combination of all features worked best for identifying seed posts.
Community features alone are best for ranking posts.
Activity Level Prediction
22
                                         Summary
     Factors that impact discussion initiation often
     differ from the factors that impact discussion
     length
       e.g. for the Golf community
         Seed Posts = all features
         Activity level = focus features
Activity Level Prediction
23
                                        Summary
     Factors that are associated with lengthy
     discussion tend to be different for different
     communities


     The title length is the only feature which has a
     slightly significant positive impact across several
     communities on the number of replies a post gets
       Work & Jobs forum title length coef=0.034 and p<0.01
       Satellite forum titles length coef =0.030 and p<0.05
24
                                   Conclusions (1)
     Different community forums exhibit interesting
     differences in terms of how attention is generated


     Most attention patterns which we identified are
     local and community-specific


     “Global” patterns may highly depend on
     composition of dataset
25
                                     Conclusions (2)
     Same features that have a positive impact on
     the start of discussions in one community can
     have a negative impact in another community


       Example: number of links
         Negative impact in most communities
         Positive impact in information and content driven
         communities
26
                                     Conclusions (3)
     Purpose of community and specificity of
     community’s subject may impact their reply
     behavior
       Communities which have a supportive purpose are
       most likely driven by different factors than
       communities with an informational purpose.
       Communities around very specific topics require posts
       to fit to the topical focus. Communities around more
       general topics do not have this requirement.
27
                      Limitations & Future Work
     Correlation versus Causality
       We cannot answer the „what would have happened if“
       question with our approach
       Controlled experiments where platform is manipulated


     Most attention patterns are lokal. But how lokal?
       Can we automatically identify the context in which
       attention patterns may hold?
Attention patterns tend to be local and community-specific.
          Ignoring communities’ idiosyncrasies isn’t a bliss.
                                     Experimental Setup




                                             THANK YOU

                              claudia.wagner@joanneum.at
                                 http://claudiawagner.info




src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

Weitere ähnliche Inhalte

Ähnlich wie Factors Impacting Attention in Online Forums Vary by Community

Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsMatthew Rowe
 
Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionWeGov project
 
Understanding Software Cohesion Metrics: Experimental Assessment of Conceptua...
Understanding Software Cohesion Metrics:Experimental Assessment of Conceptua...Understanding Software Cohesion Metrics:Experimental Assessment of Conceptua...
Understanding Software Cohesion Metrics: Experimental Assessment of Conceptua...Bruno C. da Silva
 
KASW'08 - Invited Talk
KASW'08 - Invited TalkKASW'08 - Invited Talk
KASW'08 - Invited TalkRalf Klamma
 
Standards and Standardization - A Research Project
Standards and Standardization - A Research ProjectStandards and Standardization - A Research Project
Standards and Standardization - A Research ProjectSandeep Purao
 
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...Paris Open Source Summit
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
 
Microsoft power point makingsenseofsensemaker
Microsoft power point   makingsenseofsensemakerMicrosoft power point   makingsenseofsensemaker
Microsoft power point makingsenseofsensemakerGlobalGiving
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...Jeromy Anglim
 
06 styles and_greenfield_design
06 styles and_greenfield_design06 styles and_greenfield_design
06 styles and_greenfield_designMajong DevJfu
 
Multilevel Collaboration between Software Developers and the Impact of Proxim...
Multilevel Collaboration between Software Developers and the Impact of Proxim...Multilevel Collaboration between Software Developers and the Impact of Proxim...
Multilevel Collaboration between Software Developers and the Impact of Proxim...Dawn Foster
 
Automatic Identification of Best Answers in Online Enquiry Communities
Automatic Identification of Best Answers in Online Enquiry CommunitiesAutomatic Identification of Best Answers in Online Enquiry Communities
Automatic Identification of Best Answers in Online Enquiry CommunitiesGregoire Burel
 
Conf 2012-empirikom3
Conf 2012-empirikom3Conf 2012-empirikom3
Conf 2012-empirikom3Clay Spinuzzi
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for DevelopersNeo4j
 
DevOps Gamification Workshop at JTEL Summer School 2015
DevOps Gamification Workshop at JTEL Summer School 2015DevOps Gamification Workshop at JTEL Summer School 2015
DevOps Gamification Workshop at JTEL Summer School 2015IstvanKoren
 
Boundary Spanning Leadership Integrated with Network Development Webinar
Boundary Spanning Leadership Integrated with Network Development WebinarBoundary Spanning Leadership Integrated with Network Development Webinar
Boundary Spanning Leadership Integrated with Network Development WebinarLeadership Learning Community
 

Ähnlich wie Factors Impacting Attention in Online Forums Vary by Community (20)

Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community Forums
 
Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivityprediction
 
Understanding Software Cohesion Metrics: Experimental Assessment of Conceptua...
Understanding Software Cohesion Metrics:Experimental Assessment of Conceptua...Understanding Software Cohesion Metrics:Experimental Assessment of Conceptua...
Understanding Software Cohesion Metrics: Experimental Assessment of Conceptua...
 
Crowdsourced Placemaking
Crowdsourced PlacemakingCrowdsourced Placemaking
Crowdsourced Placemaking
 
KASW'08 - Invited Talk
KASW'08 - Invited TalkKASW'08 - Invited Talk
KASW'08 - Invited Talk
 
Standards and Standardization - A Research Project
Standards and Standardization - A Research ProjectStandards and Standardization - A Research Project
Standards and Standardization - A Research Project
 
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
 
Microsoft power point makingsenseofsensemaker
Microsoft power point   makingsenseofsensemakerMicrosoft power point   makingsenseofsensemaker
Microsoft power point makingsenseofsensemaker
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...
 
06 styles and_greenfield_design
06 styles and_greenfield_design06 styles and_greenfield_design
06 styles and_greenfield_design
 
Multilevel Collaboration between Software Developers and the Impact of Proxim...
Multilevel Collaboration between Software Developers and the Impact of Proxim...Multilevel Collaboration between Software Developers and the Impact of Proxim...
Multilevel Collaboration between Software Developers and the Impact of Proxim...
 
Automatic Identification of Best Answers in Online Enquiry Communities
Automatic Identification of Best Answers in Online Enquiry CommunitiesAutomatic Identification of Best Answers in Online Enquiry Communities
Automatic Identification of Best Answers in Online Enquiry Communities
 
Sakai 3, version 8
Sakai 3, version 8Sakai 3, version 8
Sakai 3, version 8
 
Conf 2012-empirikom3
Conf 2012-empirikom3Conf 2012-empirikom3
Conf 2012-empirikom3
 
Os Long
Os LongOs Long
Os Long
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for Developers
 
DevOps Gamification Workshop at JTEL Summer School 2015
DevOps Gamification Workshop at JTEL Summer School 2015DevOps Gamification Workshop at JTEL Summer School 2015
DevOps Gamification Workshop at JTEL Summer School 2015
 
Mythrealities
MythrealitiesMythrealities
Mythrealities
 
Boundary Spanning Leadership Integrated with Network Development Webinar
Boundary Spanning Leadership Integrated with Network Development WebinarBoundary Spanning Leadership Integrated with Network Development Webinar
Boundary Spanning Leadership Integrated with Network Development Webinar
 

Mehr von Claudia Wagner

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaClaudia Wagner
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Claudia Wagner
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia? Claudia Wagner
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Claudia Wagner
 
When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...Claudia Wagner
 
WWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsWWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsClaudia Wagner
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISClaudia Wagner
 
Spatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsSpatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsClaudia Wagner
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience shortClaudia Wagner
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksClaudia Wagner
 
It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users Claudia Wagner
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsClaudia Wagner
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in TweetonomiesClaudia Wagner
 

Mehr von Claudia Wagner (18)

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in Wikipedia
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia?
 
Food and Culture
Food and CultureFood and Culture
Food and Culture
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...
 
WWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsWWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging Streams
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESIS
 
Spatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsSpatio and Temporal Dietary Patterns
Spatio and Temporal Dietary Patterns
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience short
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social Networks
 
It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users
 
Socialbots www2012
Socialbots www2012Socialbots www2012
Socialbots www2012
 
SDOW (ISWC2011)
SDOW (ISWC2011)SDOW (ISWC2011)
SDOW (ISWC2011)
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness Streams
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in Tweetonomies
 

Kürzlich hochgeladen

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Kürzlich hochgeladen (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Factors Impacting Attention in Online Forums Vary by Community

  • 1. Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities Claudia Wagner, Matthew Rowe, Markus Strohmaier and Harith Alani Amsterdam, 16.4.2012
  • 2. with… Matthew Rowe Markus Strohmaier Harith Alani
  • 3. 3 Motivation Which factors impact how much attention a post gets? We use the number of replies as a proxy measurment of attention
  • 4. Research Questions Which factors impact the attention level a post gets in certain community forums? How do these factors differ between individual community forums?
  • 5. 5 Methodology Empirical study of attention patterns in 20 randomly selected forums Two-stage approach Differentiate between threadstarter posts that got at least one reply (seed posts) and threadstarter posts which got no replies at all (non-seed posts) Predict the level of attention that seed posts will generate - i.e. the number of replies
  • 6. Dataset Most popular Irish Message Boards, Boards.ie 725 Forums Year 2005 and 2006
  • 7. 7
  • 8. Feature Engineering Aim Identify the features that impact upon seeding a discussion Identify features associated with seed posts that generate the most attention Five Feature Groups
  • 9. Five Feature Groups User Features user account age, post count, in-degree, out-degree, post rate Content Features post length, complexity, readability, link count, time in day, informativeness, polarity Title Features Length, question marks, linguistic dimensions (LIWC) Focus Features Forum entropy, forum likelihood, topic entropy, topic likelihood, topic distance Community Features Topical community fit, topical community distance, evolution score, inequity score
  • 10. Feature Computation For each threadstarter post published in one of the 20 randomly selected forums in 2006 we computed our 28 features m1 6 month Fit LDA model with standard parameter T=50, beta=0.01, alpha=50/T
  • 11. Seed Post Identification 11 Experiment Identify Posts which got replies (Binary Classification Task) Split data of each forum into train and test data (80/20) Train a logistic regression classifier with each feature group in isolation and all features combined Compare performance by using F1 score and the Matthews correlation coefficient (MCC)
  • 12. Seed Post Identification 12 Results For these 9 forums our classifiers outperforms the random baseline: Astronomy & Space: a classifier trained with content features alone performs best Spanish: a classifier trained with title features alone performs best
  • 13. Seed Post Identification 13 Feature Impact Analyze impact of individual features rather than groups Interpret statistically significant coefficients of the best performing feature group learned by the logistic regression model Rank the features of the best performing feature group using the Information Gain Ratio (IGR) as a ranking criterion
  • 14. Seed Post Identification 14 Observations In Spanish community the title length is the most important features (IGR=0.558, coef=-0.326) Posts with long titles are less likely to get replies In the Bank & Insurance forum short but complex posts which are authored by newbies are most likely to get replies Content length coef=-0.017, p< 0.05 Topic distance coef=2.890, p<0.01 Complexity has highest IGR (IGR=0.354)
  • 15. Seed Post Identification 15 Observations Number of links has a negative impact in forum Work & Jobs and Golf, but a positive impact in the Astronomy & Space forum Purpose of community Links have a positive impact in content and information driven communities Links have a negative impact in other communities
  • 16. Seed Post Identification 16 Observations Some communities require posts to fit to the topics they usually discuss (e.g., Golf) while others are more open to diverse topics (e.g., Work & Jobs) Specificity of community’s subject Subject of Work &Jobs forum is very general  high topical community distance has a positive impact Subject of Golf forum is very specific  high community distance has a negative impact
  • 17. Activity Level Prediction 17 Experiment Identify the features that were correlated with lengthy discussions Rank posts according to their attention level Evaluate our predicted rank using normalized Discounted Cumulative Gain (nDCG) at varying rank positions i.e. top-k where k={1, 5, 10, 20, 50, 100} nDCG = DCG of the predicted ranking divided by DCG the actual rank
  • 18. Activity Level Prediction 18 Results Aver AVERAGED NORMALISED DISCOUNTED CUMULATIVE GAIN A value of 1 indicates that the predicted ranking of posts perfectly matched their real ranking.
  • 19. Activity Level Prediction 19 Results Aver For the Astronomy & Space community content features were best for identifying seed posts and are also best for ranking posts according to the attention level they will generate.
  • 20. Activity Level Prediction 20 Results Aver Golf forum (343) Combination of all features worked best for identifying seed posts. Focus features alone are best for ranking posts.
  • 21. Activity Level Prediction 21 Results Aver Bank & Insurance forum (544) Combination of all features worked best for identifying seed posts. Community features alone are best for ranking posts.
  • 22. Activity Level Prediction 22 Summary Factors that impact discussion initiation often differ from the factors that impact discussion length e.g. for the Golf community Seed Posts = all features Activity level = focus features
  • 23. Activity Level Prediction 23 Summary Factors that are associated with lengthy discussion tend to be different for different communities The title length is the only feature which has a slightly significant positive impact across several communities on the number of replies a post gets Work & Jobs forum title length coef=0.034 and p<0.01 Satellite forum titles length coef =0.030 and p<0.05
  • 24. 24 Conclusions (1) Different community forums exhibit interesting differences in terms of how attention is generated Most attention patterns which we identified are local and community-specific “Global” patterns may highly depend on composition of dataset
  • 25. 25 Conclusions (2) Same features that have a positive impact on the start of discussions in one community can have a negative impact in another community Example: number of links Negative impact in most communities Positive impact in information and content driven communities
  • 26. 26 Conclusions (3) Purpose of community and specificity of community’s subject may impact their reply behavior Communities which have a supportive purpose are most likely driven by different factors than communities with an informational purpose. Communities around very specific topics require posts to fit to the topical focus. Communities around more general topics do not have this requirement.
  • 27. 27 Limitations & Future Work Correlation versus Causality We cannot answer the „what would have happened if“ question with our approach Controlled experiments where platform is manipulated Most attention patterns are lokal. But how lokal? Can we automatically identify the context in which attention patterns may hold?
  • 28. Attention patterns tend to be local and community-specific. Ignoring communities’ idiosyncrasies isn’t a bliss. Experimental Setup THANK YOU claudia.wagner@joanneum.at http://claudiawagner.info src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

Hinweis der Redaktion

  1. We randomly selected 20 forums that did not have low activity levels. One can see that the set of forums which we selected is very diverse and includes communities around very specific topics such as Golf or Astronomy &amp; Space and communities around Geographical locations such as Ripp of Ireland, and communities around very general topics such as Work&amp;Jobs
  2. Since we were interested in exploring different factors we had to develop feature groups which represent the factors which may impact users‘ reply behavior.We created 5 different groups of features which try to explain factor-groups which may potentially impact users‘ communication behavior in certain community ofurms. For example if user features are important in a forum for predicting which posts will get replied than that means that in this forum ist more important who says sth rather than what is said. That means disucssion would be driven by social factors rather than topical factors.On the other hand if content features are most important in a forum than that means that posts need to show certain content characteristics in order to get replies.Focus Features are somehow also user features but describe the topical and forum focus of a user. For some forums it might be necessary that a user has a strong topical focus (i.e. is likely to be an expert) in order to stimulate discussions while in other forums novices might be more likely to get replies.Community features describe relations between a post or its author and the community – e.g. a post might only get replies if it fits ths interests of the community or a user might be more likely to get replied if he has contributed to the community a lot (inequity theaory).
  3. Since we were interested in exploring different factors we had to develop feature groups which represent the factors which may impact users‘ reply behavior.We created 5 different groups of features which try to explain factor-groups which may potentially impact users‘ communication behavior in certain community ofurms. For example if user features are important in a forum for predicting which posts will get replied than that means that in this forum ist more important who says sth rather than what is said. That means disucssion would be driven by social factors rather than topical factors.On the other hand if content features are most important in a forum than that means that posts need to show certain content characteristics in order to get replies.Focus Features are somehow also user features but describe the topical and forum focus of a user. For some forums it might be necessary that a user has a strong topical focus (i.e. is likely to be an expert) in order to stimulate discussions while in other forums novices might be more likely to get replies.Community features describe relations between a post or its author and the community – e.g. a post might only get replies if it fits ths interests of the community or a user might be more likely to get replied if he has contributed to the community a lot (inequity theaory).
  4. Wecomputedthosefeaturesforeverythreadstarterpublished in 2006 postbyusing a 6 monthwindowprevioustowhenthepost was published.
  5. MCC is a balanced measure of the quality of binary classification and can be used even if the classes are of very different sizes.The MCC measure returns a value between -1 and +1 : 0 is no better than random prediction. The F1 score is frequently used by the IR community, while the MCC is used by ML people.
  6. For 11 forums our classifier did not outperform (but only matched) the performance of the baseline. We assume that thishappens because most of these 11 forums are rather inactive forums. Another potential explanation is that the discussion behaviour of these communities is in part rather random and/or driven by other, external factors which we could not take into account in our study. For example the discussion behaviour of the communities around specificlocations or regions might for example be impacted by spatial properties of users while the discussion behaviour of the community around forum Television seems to be mainly driven by external events (e.g. start of a new series).In most cases a combination of all features achieves the highest performance
  7. Besidetheoverallclassificationperformancewewere also interested in analyzingtheimpactofindiviualfeatures
  8. Whenanalyzingthe individual featureswemade a coupleofinterestingobservations such as
  9. NDCG wouldbe 1 ifwepredictthe realrankingpostionof a post. The measurepenalizeselementsthatappearlower down altoughtheyshouldbehigherup.
  10. Best resultsforSpanishforum.Worstresultsfor 544 (Banking &amp; Insurance &amp; Pensions)
  11. This indicates that it is important that a post’s content has certain characteristics (e.g. contains only few links) and fits the topical interests of the community in order to start a discussion.But afterwards it is important that the author of a post has certain topical and/or forum focus in order to stimulate a lengthy discussion in this forum.
  12. This indicates that for starting lengthy discussions in this forum it is important that the author of a post has topical and/or forum focus.
  13. This indicates that that in this forum posts which fit to the topical interests of the community have the potential to start lengthy discussions.
  14. Tosummerizeoursecondexperimentshowsthat
  15. So letmestartconcludingmytalk. Whatwelearnedfromourempiricalstudy was that...