SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Ignorance isn't Bliss: An Empirical Analysis of
      Attention Patterns in Online Communities
Claudia Wagner, Matthew Rowe, Markus Strohmaier and Harith Alani
                                          Amsterdam, 16.4.2012
with…




Matthew Rowe

               Markus Strohmaier




                                   Harith Alani
3
                                             Motivation




Which factors impact how much attention a post gets?

We use the number of replies as a proxy measurment of attention
Research Questions


Which factors impact the attention level a post
gets in certain community forums?


How do these factors differ between individual
community forums?
5
                                         Methodology
    Empirical study of attention patterns in 20
    randomly selected forums
    Two-stage approach
      Differentiate between threadstarter posts that got at
      least one reply (seed posts) and threadstarter posts
      which got no replies at all (non-seed posts)
      Predict the level of attention that seed posts will
      generate - i.e. the number of replies
Dataset
Most popular Irish Message Boards, Boards.ie
725 Forums
Year 2005 and 2006
7
Feature Engineering
Aim
  Identify the features that impact upon seeding a
  discussion
  Identify features associated with seed posts that
  generate the most attention


Five Feature Groups
Five Feature Groups
User Features
  user account age, post count, in-degree, out-degree, post rate
Content Features
  post length, complexity, readability, link count, time in day,
  informativeness, polarity
Title Features
  Length, question marks, linguistic dimensions (LIWC)
Focus Features
  Forum entropy, forum likelihood, topic entropy, topic likelihood, topic
  distance
Community Features
  Topical community fit, topical community distance, evolution score,
  inequity score
Feature Computation
For each threadstarter post published in one of the
20 randomly selected forums in 2006 we
computed our 28 features

                                          m1




                                6 month
  Fit LDA model with
  standard parameter
  T=50, beta=0.01, alpha=50/T
Seed Post Identification
11
                                      Experiment
     Identify Posts which got replies (Binary
     Classification Task)
       Split data of each forum into train and test data
       (80/20)
       Train a logistic regression classifier with each feature
       group in isolation and all features combined
       Compare performance by using F1 score and the
       Matthews correlation coefficient (MCC)
Seed Post Identification
12
                                                Results
For these 9 forums our classifiers outperforms the random baseline:




Astronomy & Space: a classifier trained with content features alone
performs best
Spanish: a classifier trained with title features alone performs best
Seed Post Identification
13
                                  Feature Impact
     Analyze impact of individual features rather than
     groups
       Interpret statistically significant coefficients of the best
       performing feature group learned by the logistic
       regression model
       Rank the features of the best performing feature
       group using the Information Gain Ratio (IGR) as a
       ranking criterion
Seed Post Identification
14
                                  Observations
     In Spanish community the title length is the most
     important features (IGR=0.558, coef=-0.326)
     Posts with long titles are less likely to get replies
     In the Bank & Insurance forum short but complex
     posts which are authored by newbies are most
     likely to get replies
       Content length coef=-0.017, p< 0.05
       Topic distance coef=2.890, p<0.01
       Complexity has highest IGR (IGR=0.354)
Seed Post Identification
15
                                  Observations
     Number of links has a negative impact in forum
     Work & Jobs and Golf, but a positive impact in the
     Astronomy & Space forum


     Purpose of community
       Links have a positive impact in content and
       information driven communities
       Links have a negative impact in other communities
Seed Post Identification
16
                                 Observations
     Some communities require posts to fit to the topics
     they usually discuss (e.g., Golf) while others are
     more open to diverse topics (e.g., Work & Jobs)


     Specificity of community’s subject
       Subject of Work &Jobs forum is very general  high
       topical community distance has a positive impact
       Subject of Golf forum is very specific  high
       community distance has a negative impact
Activity Level Prediction
17
                                     Experiment
     Identify the features that were correlated with
     lengthy discussions
     Rank posts according to their attention level
     Evaluate our predicted rank using normalized
     Discounted Cumulative Gain (nDCG) at varying
     rank positions i.e. top-k where k={1, 5, 10, 20, 50,
     100}
     nDCG = DCG of the predicted ranking divided by
     DCG the actual rank
Activity Level Prediction
18
                                                       Results
        Aver




     AVERAGED NORMALISED DISCOUNTED CUMULATIVE GAIN
     A value of 1 indicates that the predicted ranking of posts perfectly matched their
     real ranking.
Activity Level Prediction
19
                                           Results
     Aver




For the Astronomy & Space community content features were best
for identifying seed posts and are also best for ranking posts
according to the attention level they will generate.
Activity Level Prediction
20
                                             Results
     Aver




Golf forum (343)
Combination of all features worked best for identifying seed posts.
Focus features alone are best for ranking posts.
Activity Level Prediction
21
                                             Results
     Aver




Bank & Insurance forum (544)
Combination of all features worked best for identifying seed posts.
Community features alone are best for ranking posts.
Activity Level Prediction
22
                                         Summary
     Factors that impact discussion initiation often
     differ from the factors that impact discussion
     length
       e.g. for the Golf community
         Seed Posts = all features
         Activity level = focus features
Activity Level Prediction
23
                                        Summary
     Factors that are associated with lengthy
     discussion tend to be different for different
     communities


     The title length is the only feature which has a
     slightly significant positive impact across several
     communities on the number of replies a post gets
       Work & Jobs forum title length coef=0.034 and p<0.01
       Satellite forum titles length coef =0.030 and p<0.05
24
                                   Conclusions (1)
     Different community forums exhibit interesting
     differences in terms of how attention is generated


     Most attention patterns which we identified are
     local and community-specific


     “Global” patterns may highly depend on
     composition of dataset
25
                                     Conclusions (2)
     Same features that have a positive impact on
     the start of discussions in one community can
     have a negative impact in another community


       Example: number of links
         Negative impact in most communities
         Positive impact in information and content driven
         communities
26
                                     Conclusions (3)
     Purpose of community and specificity of
     community’s subject may impact their reply
     behavior
       Communities which have a supportive purpose are
       most likely driven by different factors than
       communities with an informational purpose.
       Communities around very specific topics require posts
       to fit to the topical focus. Communities around more
       general topics do not have this requirement.
27
                      Limitations & Future Work
     Correlation versus Causality
       We cannot answer the „what would have happened if“
       question with our approach
       Controlled experiments where platform is manipulated


     Most attention patterns are lokal. But how lokal?
       Can we automatically identify the context in which
       attention patterns may hold?
Attention patterns tend to be local and community-specific.
          Ignoring communities’ idiosyncrasies isn’t a bliss.
                                     Experimental Setup




                                             THANK YOU

                              claudia.wagner@joanneum.at
                                 http://claudiawagner.info




src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

Weitere ähnliche Inhalte

Ähnlich wie Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities

Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsMatthew Rowe
 
Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionWeGov project
 
Understanding Software Cohesion Metrics: Experimental Assessment of Conceptua...
Understanding Software Cohesion Metrics:Experimental Assessment of Conceptua...Understanding Software Cohesion Metrics:Experimental Assessment of Conceptua...
Understanding Software Cohesion Metrics: Experimental Assessment of Conceptua...Bruno C. da Silva
 
KASW'08 - Invited Talk
KASW'08 - Invited TalkKASW'08 - Invited Talk
KASW'08 - Invited TalkRalf Klamma
 
Standards and Standardization - A Research Project
Standards and Standardization - A Research ProjectStandards and Standardization - A Research Project
Standards and Standardization - A Research ProjectSandeep Purao
 
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...Paris Open Source Summit
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
 
Microsoft power point makingsenseofsensemaker
Microsoft power point   makingsenseofsensemakerMicrosoft power point   makingsenseofsensemaker
Microsoft power point makingsenseofsensemakerGlobalGiving
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...Jeromy Anglim
 
06 styles and_greenfield_design
06 styles and_greenfield_design06 styles and_greenfield_design
06 styles and_greenfield_designMajong DevJfu
 
Multilevel Collaboration between Software Developers and the Impact of Proxim...
Multilevel Collaboration between Software Developers and the Impact of Proxim...Multilevel Collaboration between Software Developers and the Impact of Proxim...
Multilevel Collaboration between Software Developers and the Impact of Proxim...Dawn Foster
 
Automatic Identification of Best Answers in Online Enquiry Communities
Automatic Identification of Best Answers in Online Enquiry CommunitiesAutomatic Identification of Best Answers in Online Enquiry Communities
Automatic Identification of Best Answers in Online Enquiry CommunitiesGregoire Burel
 
Conf 2012-empirikom3
Conf 2012-empirikom3Conf 2012-empirikom3
Conf 2012-empirikom3Clay Spinuzzi
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for DevelopersNeo4j
 
DevOps Gamification Workshop at JTEL Summer School 2015
DevOps Gamification Workshop at JTEL Summer School 2015DevOps Gamification Workshop at JTEL Summer School 2015
DevOps Gamification Workshop at JTEL Summer School 2015IstvanKoren
 
Boundary Spanning Leadership Integrated with Network Development Webinar
Boundary Spanning Leadership Integrated with Network Development WebinarBoundary Spanning Leadership Integrated with Network Development Webinar
Boundary Spanning Leadership Integrated with Network Development WebinarLeadership Learning Community
 

Ähnlich wie Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities (20)

Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community Forums
 
Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivityprediction
 
Understanding Software Cohesion Metrics: Experimental Assessment of Conceptua...
Understanding Software Cohesion Metrics:Experimental Assessment of Conceptua...Understanding Software Cohesion Metrics:Experimental Assessment of Conceptua...
Understanding Software Cohesion Metrics: Experimental Assessment of Conceptua...
 
Crowdsourced Placemaking
Crowdsourced PlacemakingCrowdsourced Placemaking
Crowdsourced Placemaking
 
KASW'08 - Invited Talk
KASW'08 - Invited TalkKASW'08 - Invited Talk
KASW'08 - Invited Talk
 
Standards and Standardization - A Research Project
Standards and Standardization - A Research ProjectStandards and Standardization - A Research Project
Standards and Standardization - A Research Project
 
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
 
Microsoft power point makingsenseofsensemaker
Microsoft power point   makingsenseofsensemakerMicrosoft power point   makingsenseofsensemaker
Microsoft power point makingsenseofsensemaker
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...
 
06 styles and_greenfield_design
06 styles and_greenfield_design06 styles and_greenfield_design
06 styles and_greenfield_design
 
Multilevel Collaboration between Software Developers and the Impact of Proxim...
Multilevel Collaboration between Software Developers and the Impact of Proxim...Multilevel Collaboration between Software Developers and the Impact of Proxim...
Multilevel Collaboration between Software Developers and the Impact of Proxim...
 
Automatic Identification of Best Answers in Online Enquiry Communities
Automatic Identification of Best Answers in Online Enquiry CommunitiesAutomatic Identification of Best Answers in Online Enquiry Communities
Automatic Identification of Best Answers in Online Enquiry Communities
 
Sakai 3, version 8
Sakai 3, version 8Sakai 3, version 8
Sakai 3, version 8
 
Conf 2012-empirikom3
Conf 2012-empirikom3Conf 2012-empirikom3
Conf 2012-empirikom3
 
Os Long
Os LongOs Long
Os Long
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for Developers
 
DevOps Gamification Workshop at JTEL Summer School 2015
DevOps Gamification Workshop at JTEL Summer School 2015DevOps Gamification Workshop at JTEL Summer School 2015
DevOps Gamification Workshop at JTEL Summer School 2015
 
Mythrealities
MythrealitiesMythrealities
Mythrealities
 
Boundary Spanning Leadership Integrated with Network Development Webinar
Boundary Spanning Leadership Integrated with Network Development WebinarBoundary Spanning Leadership Integrated with Network Development Webinar
Boundary Spanning Leadership Integrated with Network Development Webinar
 

Mehr von Claudia Wagner

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaClaudia Wagner
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Claudia Wagner
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia? Claudia Wagner
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Claudia Wagner
 
When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...Claudia Wagner
 
WWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsWWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsClaudia Wagner
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISClaudia Wagner
 
Spatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsSpatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsClaudia Wagner
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience shortClaudia Wagner
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksClaudia Wagner
 
It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users Claudia Wagner
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsClaudia Wagner
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in TweetonomiesClaudia Wagner
 

Mehr von Claudia Wagner (18)

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in Wikipedia
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia?
 
Food and Culture
Food and CultureFood and Culture
Food and Culture
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...
 
WWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsWWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging Streams
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESIS
 
Spatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsSpatio and Temporal Dietary Patterns
Spatio and Temporal Dietary Patterns
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience short
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social Networks
 
It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users
 
Socialbots www2012
Socialbots www2012Socialbots www2012
Socialbots www2012
 
SDOW (ISWC2011)
SDOW (ISWC2011)SDOW (ISWC2011)
SDOW (ISWC2011)
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness Streams
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in Tweetonomies
 

Kürzlich hochgeladen

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities

  • 1. Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities Claudia Wagner, Matthew Rowe, Markus Strohmaier and Harith Alani Amsterdam, 16.4.2012
  • 2. with… Matthew Rowe Markus Strohmaier Harith Alani
  • 3. 3 Motivation Which factors impact how much attention a post gets? We use the number of replies as a proxy measurment of attention
  • 4. Research Questions Which factors impact the attention level a post gets in certain community forums? How do these factors differ between individual community forums?
  • 5. 5 Methodology Empirical study of attention patterns in 20 randomly selected forums Two-stage approach Differentiate between threadstarter posts that got at least one reply (seed posts) and threadstarter posts which got no replies at all (non-seed posts) Predict the level of attention that seed posts will generate - i.e. the number of replies
  • 6. Dataset Most popular Irish Message Boards, Boards.ie 725 Forums Year 2005 and 2006
  • 7. 7
  • 8. Feature Engineering Aim Identify the features that impact upon seeding a discussion Identify features associated with seed posts that generate the most attention Five Feature Groups
  • 9. Five Feature Groups User Features user account age, post count, in-degree, out-degree, post rate Content Features post length, complexity, readability, link count, time in day, informativeness, polarity Title Features Length, question marks, linguistic dimensions (LIWC) Focus Features Forum entropy, forum likelihood, topic entropy, topic likelihood, topic distance Community Features Topical community fit, topical community distance, evolution score, inequity score
  • 10. Feature Computation For each threadstarter post published in one of the 20 randomly selected forums in 2006 we computed our 28 features m1 6 month Fit LDA model with standard parameter T=50, beta=0.01, alpha=50/T
  • 11. Seed Post Identification 11 Experiment Identify Posts which got replies (Binary Classification Task) Split data of each forum into train and test data (80/20) Train a logistic regression classifier with each feature group in isolation and all features combined Compare performance by using F1 score and the Matthews correlation coefficient (MCC)
  • 12. Seed Post Identification 12 Results For these 9 forums our classifiers outperforms the random baseline: Astronomy & Space: a classifier trained with content features alone performs best Spanish: a classifier trained with title features alone performs best
  • 13. Seed Post Identification 13 Feature Impact Analyze impact of individual features rather than groups Interpret statistically significant coefficients of the best performing feature group learned by the logistic regression model Rank the features of the best performing feature group using the Information Gain Ratio (IGR) as a ranking criterion
  • 14. Seed Post Identification 14 Observations In Spanish community the title length is the most important features (IGR=0.558, coef=-0.326) Posts with long titles are less likely to get replies In the Bank & Insurance forum short but complex posts which are authored by newbies are most likely to get replies Content length coef=-0.017, p< 0.05 Topic distance coef=2.890, p<0.01 Complexity has highest IGR (IGR=0.354)
  • 15. Seed Post Identification 15 Observations Number of links has a negative impact in forum Work & Jobs and Golf, but a positive impact in the Astronomy & Space forum Purpose of community Links have a positive impact in content and information driven communities Links have a negative impact in other communities
  • 16. Seed Post Identification 16 Observations Some communities require posts to fit to the topics they usually discuss (e.g., Golf) while others are more open to diverse topics (e.g., Work & Jobs) Specificity of community’s subject Subject of Work &Jobs forum is very general  high topical community distance has a positive impact Subject of Golf forum is very specific  high community distance has a negative impact
  • 17. Activity Level Prediction 17 Experiment Identify the features that were correlated with lengthy discussions Rank posts according to their attention level Evaluate our predicted rank using normalized Discounted Cumulative Gain (nDCG) at varying rank positions i.e. top-k where k={1, 5, 10, 20, 50, 100} nDCG = DCG of the predicted ranking divided by DCG the actual rank
  • 18. Activity Level Prediction 18 Results Aver AVERAGED NORMALISED DISCOUNTED CUMULATIVE GAIN A value of 1 indicates that the predicted ranking of posts perfectly matched their real ranking.
  • 19. Activity Level Prediction 19 Results Aver For the Astronomy & Space community content features were best for identifying seed posts and are also best for ranking posts according to the attention level they will generate.
  • 20. Activity Level Prediction 20 Results Aver Golf forum (343) Combination of all features worked best for identifying seed posts. Focus features alone are best for ranking posts.
  • 21. Activity Level Prediction 21 Results Aver Bank & Insurance forum (544) Combination of all features worked best for identifying seed posts. Community features alone are best for ranking posts.
  • 22. Activity Level Prediction 22 Summary Factors that impact discussion initiation often differ from the factors that impact discussion length e.g. for the Golf community Seed Posts = all features Activity level = focus features
  • 23. Activity Level Prediction 23 Summary Factors that are associated with lengthy discussion tend to be different for different communities The title length is the only feature which has a slightly significant positive impact across several communities on the number of replies a post gets Work & Jobs forum title length coef=0.034 and p<0.01 Satellite forum titles length coef =0.030 and p<0.05
  • 24. 24 Conclusions (1) Different community forums exhibit interesting differences in terms of how attention is generated Most attention patterns which we identified are local and community-specific “Global” patterns may highly depend on composition of dataset
  • 25. 25 Conclusions (2) Same features that have a positive impact on the start of discussions in one community can have a negative impact in another community Example: number of links Negative impact in most communities Positive impact in information and content driven communities
  • 26. 26 Conclusions (3) Purpose of community and specificity of community’s subject may impact their reply behavior Communities which have a supportive purpose are most likely driven by different factors than communities with an informational purpose. Communities around very specific topics require posts to fit to the topical focus. Communities around more general topics do not have this requirement.
  • 27. 27 Limitations & Future Work Correlation versus Causality We cannot answer the „what would have happened if“ question with our approach Controlled experiments where platform is manipulated Most attention patterns are lokal. But how lokal? Can we automatically identify the context in which attention patterns may hold?
  • 28. Attention patterns tend to be local and community-specific. Ignoring communities’ idiosyncrasies isn’t a bliss. Experimental Setup THANK YOU claudia.wagner@joanneum.at http://claudiawagner.info src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

Hinweis der Redaktion

  1. We randomly selected 20 forums that did not have low activity levels. One can see that the set of forums which we selected is very diverse and includes communities around very specific topics such as Golf or Astronomy &amp; Space and communities around Geographical locations such as Ripp of Ireland, and communities around very general topics such as Work&amp;Jobs
  2. Since we were interested in exploring different factors we had to develop feature groups which represent the factors which may impact users‘ reply behavior.We created 5 different groups of features which try to explain factor-groups which may potentially impact users‘ communication behavior in certain community ofurms. For example if user features are important in a forum for predicting which posts will get replied than that means that in this forum ist more important who says sth rather than what is said. That means disucssion would be driven by social factors rather than topical factors.On the other hand if content features are most important in a forum than that means that posts need to show certain content characteristics in order to get replies.Focus Features are somehow also user features but describe the topical and forum focus of a user. For some forums it might be necessary that a user has a strong topical focus (i.e. is likely to be an expert) in order to stimulate discussions while in other forums novices might be more likely to get replies.Community features describe relations between a post or its author and the community – e.g. a post might only get replies if it fits ths interests of the community or a user might be more likely to get replied if he has contributed to the community a lot (inequity theaory).
  3. Since we were interested in exploring different factors we had to develop feature groups which represent the factors which may impact users‘ reply behavior.We created 5 different groups of features which try to explain factor-groups which may potentially impact users‘ communication behavior in certain community ofurms. For example if user features are important in a forum for predicting which posts will get replied than that means that in this forum ist more important who says sth rather than what is said. That means disucssion would be driven by social factors rather than topical factors.On the other hand if content features are most important in a forum than that means that posts need to show certain content characteristics in order to get replies.Focus Features are somehow also user features but describe the topical and forum focus of a user. For some forums it might be necessary that a user has a strong topical focus (i.e. is likely to be an expert) in order to stimulate discussions while in other forums novices might be more likely to get replies.Community features describe relations between a post or its author and the community – e.g. a post might only get replies if it fits ths interests of the community or a user might be more likely to get replied if he has contributed to the community a lot (inequity theaory).
  4. Wecomputedthosefeaturesforeverythreadstarterpublished in 2006 postbyusing a 6 monthwindowprevioustowhenthepost was published.
  5. MCC is a balanced measure of the quality of binary classification and can be used even if the classes are of very different sizes.The MCC measure returns a value between -1 and +1 : 0 is no better than random prediction. The F1 score is frequently used by the IR community, while the MCC is used by ML people.
  6. For 11 forums our classifier did not outperform (but only matched) the performance of the baseline. We assume that thishappens because most of these 11 forums are rather inactive forums. Another potential explanation is that the discussion behaviour of these communities is in part rather random and/or driven by other, external factors which we could not take into account in our study. For example the discussion behaviour of the communities around specificlocations or regions might for example be impacted by spatial properties of users while the discussion behaviour of the community around forum Television seems to be mainly driven by external events (e.g. start of a new series).In most cases a combination of all features achieves the highest performance
  7. Besidetheoverallclassificationperformancewewere also interested in analyzingtheimpactofindiviualfeatures
  8. Whenanalyzingthe individual featureswemade a coupleofinterestingobservations such as
  9. NDCG wouldbe 1 ifwepredictthe realrankingpostionof a post. The measurepenalizeselementsthatappearlower down altoughtheyshouldbehigherup.
  10. Best resultsforSpanishforum.Worstresultsfor 544 (Banking &amp; Insurance &amp; Pensions)
  11. This indicates that it is important that a post’s content has certain characteristics (e.g. contains only few links) and fits the topical interests of the community in order to start a discussion.But afterwards it is important that the author of a post has certain topical and/or forum focus in order to stimulate a lengthy discussion in this forum.
  12. This indicates that for starting lengthy discussions in this forum it is important that the author of a post has topical and/or forum focus.
  13. This indicates that that in this forum posts which fit to the topical interests of the community have the potential to start lengthy discussions.
  14. Tosummerizeoursecondexperimentshowsthat
  15. So letmestartconcludingmytalk. Whatwelearnedfromourempiricalstudy was that...