This document discusses analyzing user behavior in online communities to understand community evolution and health. It proposes modeling user behavior with an ontology and identifying roles based on behavior features. Community roles like elitist correlate with specific behavior levels, like a low in-degree ratio. Analyzing changes in behavior roles over time could provide insights into community health and enable predicting community changes. The approach differs across social web systems and could reveal how behavior and roles compose communities in unique ways.
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
1. Using Behaviour Analysis to
Detect Cultural Aspects in Social
Web Systems
Dr Matthew Rowe
Knowledge Media Institute, The Open University,
Milton Keynes, United Kingdom
http://people.kmi.open.ac.uk/rowe | http://www.matthew-rowe.com
2. Web 1.0
• Web of documents
• Web presence constrained to HTML ‘experts’
• Fixed categories
• Static content
http://www.flickr.com/photos/complexify/97303317/
Using Behaviour Analysis to Detect Cultural Aspects in 1
Social Web Systems
3. Web 2.0
• Data access through APIs
• Collective Intelligence
• User generated content
• Web presence for all
• Tagging
http://www.flickr.com/photos/9119028@N05/591163479
Using Behaviour Analysis to Detect Cultural Aspects in 2
Social Web Systems
4. A Social Web
A Social Web System is an online platform that
offers a useful service, normally for free, to
users, through which they can interact and network
http://mmt.me.uk/slides/deri20110401/images/walledgardens.jpg
Using Behaviour Analysis to Detect Cultural Aspects in 3
Social Web Systems
7. Δs of Social Web
Systems
• Social Web Systems differ in their:
– Domain
• Flickr = photos
• Facebook = social networking
• Twitter = microblogging
– Audience
• SAP Community Network = programmers
• Slashdot = technology enthusiasts
• How else do they differ?
• What are the Δs?
Using Behaviour Analysis to Detect Cultural Aspects in 6
Social Web Systems
8. The Utility of
Behaviour Analysis
• WeGov
– Investigating the role of social networks in eGovernment
– Enabling:
• Tracking of political discussions and topics
• Injection of policy content to maximise exposure
• ROBUST
– Risk and opportunity management in online communities
– Enabling
• Assessment of user churn in online communities
• Community evolution prediction
• Monitoring of community health
• Behaviour analysis is required to understand:
– What behaviour drives content creation
– How behaviour is associated with community evolution
Using Behaviour Analysis to Detect Cultural Aspects in 7
Social Web Systems
9. Thesis: Microcultures
Social Web Systems contain micro-cultures
that differ in terms of
a) user behaviour
b) how attention is generated
c) role compositions in such systems
Using Behaviour Analysis to Detect Cultural Aspects in 8
Social Web Systems
10. Outline
• Analysis 1: Generating Attention
– Understanding Attention Factors
– Approach
– Experiments
– Findings
• Analysis 2: Behaviour Role Compositions
– Analysing Community Evolution
– Approach
– Experiments
– Findings
• Microcultures: Evidence
Using Behaviour Analysis to Detect Cultural Aspects in 9
Social Web Systems
12. Shared Content
• Social Web Systems are now used to:
– Ask questions
– Post opinions and ideas
– Discuss events and current issues
• Content analysis in online communities is attractive for:
– Market analysis
– Brand consensus and product opinion
• Social network analytics in the US is predicted to reach
$1 billion by 2014 (Forrester 2009)
• Masses of data is now being published in social web systems:
– Facebook has more than 60 million status updates per day (Facebook statistics
2010)
Using Behaviour Analysis to Detect Cultural Aspects in 11
Social Web Systems
14. The Need for
Analysis
• Analysts need to know which piece of content will generate
the most activity
– i.e. the most auspicious or influential
– Helps focus the attention of human and computerised
analysts
• What to track?
• Need to understand the effect features (community and
content) have on attention to content
Which features are key to stimulating activity?
How do these features influence activity length?
How do Social Web Systems differ in how attention is generated?
Using Behaviour Analysis to Detect Cultural Aspects in 13
Social Web Systems
15. Approach:
Attention Prediction
• Two-stage approach to predict attention to content:
1. Identify seed posts
• E.g. thread starters on a message board
• Will a given post start a discussion?
• What are the properties that seed posts exhibit?
– What parameters tend to trigger a discussion?
2. Predict discussion activity levels
• From the identified seed posts
• What is the level of activity that a seed post will
generate?
• What features correlate with heightened activity?
Using Behaviour Analysis to Detect Cultural Aspects in 14
Social Web Systems
16. Features
Which features are key to stimulating activity?
• For each post, model: a) the author, b) the content and
c) the topical concentration of the author
• F1: User Features
– In-degree, out-degree: social network properties of the author
– Post count, age, post rate: participation information of the author
• F2: Content Features
– Post length, referral count, time in day: surface features of the
post
– Complexity: cumulative entropy of terms in the post
– Readability: Gunning Fog index of the post
– Informativeness: TF-IDF measure of terms within the post
– Polarity: average sentiment of terms in the post
Using Behaviour Analysis to Detect Cultural Aspects in 15
Social Web Systems
17. Features (2)
• F3: Focus Features
– Topic entropy: the concentration of the author across
community forums
• Higher entropy indicates a wider spread of forum activity
• More random distribution, less concentrated
– Topic Likelihood: the likelihood that a user posts in a
specific forum given his post history
• Measures the affinity that a user has with a given forum
• Lower likelihood indicates a user posting on an unfamiliar
topic
Using Behaviour Analysis to Detect Cultural Aspects in 16
Social Web Systems
18. Social Web Systems:
Datasets
• Microblogging Platform: Twitter
– Collected a random subset over 24-hour period
– Attention measure: length of @reply chain
• Community Message Board: Boards.ie
– Analysed all posts and forums in 2006
– Attention measure: number of posts in a thread
• Support Forum: SAP Community Network
– Attention measure: number of replies to a question
• News-sharing Platform: Digg
– Used previous dataset of ‘popular’ stories
– Attention measure: number of comments (and replies) to a story
Using Behaviour Analysis to Detect Cultural Aspects in 17
Social Web Systems
19. Experiments
• Experiment 1: Identifying Seed Posts
– Will this post yield a reply?
– Experiment 1(a): Model Selection
• Which model performs best?
– Experiment 1(b): Feature Assessment
• How do features correlate with seed posts?
– Datasets: Twitter and Boards.ie
• Experiment 2: Activity Level Prediction
– What is the level of activity that seed posts yield?
– Experiment 2(a): Model Selection
– Experiment 2(b): Feature Assessment
• How do features correlate with heightened attention?
– Datasets: Twitter, Boards.ie, SCN and Digg
Using Behaviour Analysis to Detect Cultural Aspects in 18
Social Web Systems
20. Experiments
• Experiment 1: Identifying Seed Posts
– Will this post yield a reply
– Experiment 1(a): Model Selection
• Which model performs best?
– Experiment 1(b): Feature Assessment
• How do features correlate with seed posts?
– Datasets: Twitter and Boards.ie
• Experiment 2: Activity Level Prediction
– What is the level of activity that seed posts yield?
– Experiment 2(a): Model Selection
– Experiment 2(b): Feature Assessment
• How do features correlate with heightened attention?
– Datasets: Twitter, Boards.ie, SCN and Digg
Using Behaviour Analysis to Detect Cultural Aspects in 19
Social Web Systems
21. Results: 1(a) Model
Selection
• Which model performs best?
Twitter Boards.ie
Using Behaviour Analysis to Detect Cultural Aspects in 20
Social Web Systems
22. Results: 1(b) Feature
Assessment
• How do features correlate with seed posts?
Using Behaviour Analysis to Detect Cultural Aspects in 21
Social Web Systems
23. Results: 1(b) Feature
Assessment
Twitter
Boards.ie
Using Behaviour Analysis to Detect Cultural Aspects in 22
Social Web Systems
24. Experiments
• Experiment 1: Identifying Seed Posts
– Will this post yield a reply
– Experiment 1(a): Model Selection
• Which model performs best?
– Experiment 1(b): Feature Assessment
• How do features correlate with seed posts?
– Datasets: Twitter and Boards.ie
• Experiment 2: Activity Level Prediction
– What is the level of activity that seed posts yield?
– Experiment 2(a): Model Selection
– Experiment 2(b): Feature Assessment
• How do features correlate with heightened attention?
– Datasets: Twitter, Boards.ie, SCN and Digg
Using Behaviour Analysis to Detect Cultural Aspects in 23
Social Web Systems
25. Activity Distribution
Twitter Boards.ie
1. Predict a ranking
2. Compare ranking against ground truth
3. Measure using Normalised Discounted Cumulative Gain @ varying ranks (k)
• k={1,5,10,20,50,100}
4. Best model: highest nDCG averaged over k
SCN Digg
Using Behaviour Analysis to Detect Cultural Aspects in 24
Social Web Systems
26. Results: 2(a) Model
Selection
• Which model performs best?
Using Behaviour Analysis to Detect Cultural Aspects in 25
Social Web Systems
27. Results: 2(b) Feature
Assessment
• How do features correlate with heightened attention?
Using Behaviour Analysis to Detect Cultural Aspects in 26
Social Web Systems
28. Results: 2(b) Feature
Assessment
• How do features correlate with heightened attention?
• Heightened Activity on Twitter=
• Shorter posts
• Denser vocabulary
• Fewer hyperlinks
• Earlier in the day!
Using Behaviour Analysis to Detect Cultural Aspects in 27
Social Web Systems
29. Results: 2(b) Feature
Assessment
• How do features correlate with heightened attention?
• Heightened Activity on Boards.ie=
• Concentrated topics
• Longer posts
• Wider vocabulary
• Fewer referrals
• Negative sentiment
Using Behaviour Analysis to Detect Cultural Aspects in 28
Social Web Systems
30. Results: 2(b) Feature
Assessment
• How do features correlate with heightened attention?
• Heightened Activity on SCN=
• Less author participation
• Contacted fewer people
• User contacted by many people
• Longer posts
• Wider vocabulary
• More hyperlinks
Using Behaviour Analysis to Detect Cultural Aspects in 29
Social Web Systems
31. Results: 2(b) Feature
Assessment
• How do features correlate with heightened attention?
• Heightened Activity on Digg=
• Concentrated topics
• Longer posts
• Later in the day
• Familiar community terms
Using Behaviour Analysis to Detect Cultural Aspects in 30
Social Web Systems
32. Generating Attention:
Findings
How do Social Web Systems differ in how attention is generated?
• Commonalities
– Fewer hyperlinks for Microblogging platforms and discussion message
boards
– Use familiar language to the community
– Negative content yields more activity
– Activity distribution
What drives attention in one system is not the
• Idiosyncrasies same as another
– More hyperlinks on support forums
– Lower topic affinity on news-sharing system
– Models differ: a) best performing, b) coefficients:
• Content: Twitter
• User: Boards.ie, SCN
• Focus: Digg
Anticipating Discussion Activity on Community Forums. M Rowe, S Angeletou and H Alani. The
Third IEEE International Conference on Social Computing. Boston, USA. (2011)
Using Behaviour Analysis to Detect Cultural Aspects in 31
Social Web Systems
34. Online Communities in
Social Web Systems
• Social Web Systems support online communities to
function and grow, enabling:
– Idea generation
– Customer support
– Problem solving
• Managing and hosting communities can be
– Expensive
– Time-consuming
• Social Web Systems have large investments, therefore
they must:
– flourish and remain active
– remain… ‘healthy’
Using Behaviour Analysis to Detect Cultural Aspects in 33
Social Web Systems
35. Increased Community
Activity
What did the community look like at the point?
Using Behaviour Analysis to Detect Cultural Aspects in 34
Social Web Systems
36. Decreased
Community Activity
What were the conditions
at this point?
Using Behaviour Analysis to Detect Cultural Aspects in 35
Social Web Systems
37. The Need to Assess
Behaviour
• How can we gauge community health?
– Post Count?
– Communication/Interaction?
– Behaviour?
• Domination of one behaviour could lead to churn
– Preece, 2000
• Behaviour in online community is influenced by the roles that
users assume
– Preece, 2001
• To provide health insights we need to monitor behaviour over
time
– Combined with basic health metrics (e.g. post count)
• Enabling detection of how behaviour differs between systems
Using Behaviour Analysis to Detect Cultural Aspects in 36
Social Web Systems
38. Modelling, Representing and
Tracking Behaviour: How?
• Users exhibit different behaviour in different contexts:
– How can we model user behaviour and represent its change over
time?
• According to [Chan et al, 2010] users can be classified by
their community role:
– What behaviour correlates with community roles?
– How can we label users as the system changes?
• Communities evolve and change over time:
– Is there a correlation between community composition and
health?
– Can we predict community changes based on composition data?
How do Social Web Systems differ in terms of behaviour?
Using Behaviour Analysis to Detect Cultural Aspects in 37
Social Web Systems
39. Behaviour Ontology
• How can we model user behaviour and represent its change over
time?
http://purl.org/net/oubo/0.3
Using Behaviour Analysis to Detect Cultural Aspects in 38
Social Web Systems
40. Behaviour Features
• In-degree Ratio
– Proportion of users that reply to user ui
• Posts Replied Ratio
– Proportion of posts by ui that yield a reply
• Thread Initiation Ratio
– Proportion of threads started by ui
• Bi-directional Threads Ratio
– Proportion of threads where ui is involved in a reciprocal action
• Bi-directional Neighbours Ratio
– Proportion of ui‘s neighbours with whom a reciprocal action has
taken place
• Average Posts per Thread
– Mean number of posts in the threads that ui has participated in
• Standard Deviation of Posts per Thread
– Standard deviation of posts in the threads that ui has posted in
Using Behaviour Analysis to Detect Cultural Aspects in 39
Social Web Systems
41. Behaviour Roles
Elitist
Grunt
Joining Conversationalist
Popular Initiator
Popular Participant
Supporter
Taciturn Jeffrey Chan, Conor Hayes, and Elizabeth Daly. Decomposing
discussion forums using common user roles. In Proc. Web Science
Ignored Conf. (WebSci10), Raleigh, NC: US, 2010.
Using Behaviour Analysis to Detect Cultural Aspects in 40
Social Web Systems
42. Behaviour Roles (2)
• What behaviour correlates with community roles?
T abl e 1. Roles and t he feat ure-t o-level mappings
R ol e Feat ur e L evel
E l i t i st I n-D egr ee R at i o l ow
B i -di r ect i onal T hr eads R at i o hi gh
B i -di r ect i onal N ei ghb our s R at i o l ow
G r unt B i -di r ect i onal T hr eads R at i o m ed
B i -di r ect i onal N ei ghb our s R at i o m ed
A ver age Post s p er T hr ead l ow
ST D of Post s p er T hr ead l ow
Joi ni ng Conver sat i onal i st T hr ead I ni t i at i on R at i o l ow
A ver age Post s p er T hr ead hi gh
ST D of Post s p er T hr ead hi gh
Popul ar I ni t i at or I n-D egr ee R at i o hi gh
T hr ead I ni t i at i on R at i o hi gh
Popul ar Par t i ci pant s I n-D egr ee R at i o hi gh
T hr ead I ni t i at i on R at i o l ow
A ver age Post s p er T hr ead m ed
ST D of Post s p er T hr ead m ed
Supp or t er I n-D egr ee R at i o m ed
B i -di r ect i onal T hr eads R at i o m ed
B i -di r ect i onal N ei ghb our s R at i o m ed
T aci t ur n B i -di r ect i onal T hr eads R at i o l ow
B i -di r ect i onal N ei ghb our s R at i o l ow
A ver age Post s p er T hr ead l ow
ST D of Post s p er T hr ead l ow
I gnor ed Post s R epl i ed R at i o l ow
Using Behaviour Analysis to Detect Cultural Aspects in 41
Social Web Systems
43. Constructing and
Applying Rules
• How can we label users as the system changes?
Structural, social network, Feature levels change with the
reciprocity, persistence, participation dynamics of the community
Run rules over each user’s features Based on related work, we associate
and derive the community role composition roles with a collection of feature-to-level
mappings
e.g. in-degree -> high, out-degree -> high
Using Behaviour Analysis to Detect Cultural Aspects in 42
Social Web Systems
44. Composition vs
Activity
• Is there a correlation between community composition and
health?
• Community Message board: Boards.ie
– All posts used from 2004 – 2006
– Selected 3 forums for analysis
• F246: Commuting and Transport
• F388: Rugby
• F411: Mobile Phones and PDAs
• Support Forum: Tiddlywiki
– Software development forum used by BT’s development team
• Measured at 12-week increments:
– Forum composition (% of roles)
• E.g. 20% elitists, 10% grunts, etc
– Number of posts
Using Behaviour Analysis to Detect Cultural Aspects in 43
Social Web Systems
45. Correlation Results
(1): Boards.ie
Forum 246 – Commuting and Transport
Using Behaviour Analysis to Detect Cultural Aspects in 44
Social Web Systems
46. Correlation Results
(2): Boards.ie
Forum 246 – Commuting Forum 388 – Rugby Forum 411 – Mobile Phones
and Transport and PDAs
Using Behaviour Analysis to Detect Cultural Aspects in 45
Social Web Systems
47. Correlation Results:
Tiddlywiki
Using Behaviour Analysis to Detect Cultural Aspects in 46
Social Web Systems
48. Evolution Results (1):
Boards.ie
Forum 246 – Commuting and Transport
Using Behaviour Analysis to Detect Cultural Aspects in 47
Social Web Systems
49. Evolution Results (2):
Boards.ie
Forum 246 – Commuting Forum 388 – Rugby Forum 411 – Mobile Phones
and Transport and PDAs
Using Behaviour Analysis to Detect Cultural Aspects in 48
Social Web Systems
50. Evolution Results:
Tiddlywiki
Using Behaviour Analysis to Detect Cultural Aspects in 49
Social Web Systems
51. Predicting Community
Health
• Can we predict community changes based on
composition data?
1. Activity Change Detection:
– Predict either an increase or decrease in activity
– Features: roles and percentages
– Class label: increase/decrease
– Performed 10-fold cross validation with J48 decision tree
2. Post Count Prediction:
– Predict post count from role composition
– Independent variables: roles and percentages
– Dependent variable: post count
– Induced linear regression model and assessed the model
Using Behaviour Analysis to Detect Cultural Aspects in 50
Social Web Systems
52. Activity Change
Detection
Boards.ie
Tiddlywiki
Using Behaviour Analysis to Detect Cultural Aspects in 51
Social Web Systems
53. Post Count
Prediction
Boards.ie
Tiddlywiki
Using Behaviour Analysis to Detect Cultural Aspects in 52
Social Web Systems
54. Post Count
Prediction
Boards.ie
Tiddlywiki
• Increased Community Activity on Boards.ie =
• More initiators
• More participants
• Less supporters
• Fewer ignored
Using Behaviour Analysis to Detect Cultural Aspects in 53
Social Web Systems
55. Post Count
Prediction
Boards.ie
• Increased Community Activity on Tiddlywiki =
• More conversationalists
• More initiators
• Fewer supporters
• Fewer ignored
Tiddlywiki
Using Behaviour Analysis to Detect Cultural Aspects in 54
Social Web Systems
56. Clustering Communities
by Composition
Using Behaviour Analysis to Detect Cultural Aspects in 55
Social Web Systems
57. Behaviour Role
Compositions: Findings
• How do Social Web Systems differ in terms of
behaviour?
• Commonalities
– No grunts in either system
– Increase in ignored users and supporters decreases health
– Increase in initiators increases activity
• Idiosyncrasies
– No elitists found on support-forum
– Conversationalists improve activity in certain cases
– Optimum behaviour compositions differ
Modelling and Analysis of User Behaviour in Online Communities. S Angeletou, M Rowe and H
Alani. International Semantic Web Conference. Bonn, Germany. (2011)
Using Behaviour Analysis to Detect Cultural Aspects in 56
Social Web Systems
58. Thesis: Microcultures
Recap
Social Web Systems contain micro-cultures
that differ in terms of
a) user behaviour
b) how attention is generated
c) role compositions in such systems
Using Behaviour Analysis to Detect Cultural Aspects in 57
Social Web Systems
59. Microcultures:
Evidence
• Social Web Systems contain micro-cultures that differ
in terms of
– a) User behaviour
• Non-existence of roles in certain communities
• Conversation behaviour important in certain communities
– b) How attention is generated
• Differences in optimum prediction models
• Factors differ in driving activity
– E.g. referrals, topic affinity
– c) Role compositions in such systems
• Intra and inter composition differences
Using Behaviour Analysis to Detect Cultural Aspects in 58
Social Web Systems
60. Questions?
Web: http://people.kmi.open.ac.uk/rowe
http://www.matthew-rowe.com
Email: m.c.rowe@open.ac.uk
Twitter: @mattroweshow
Using Behaviour Analysis to Detect Cultural Aspects in 59
Social Web Systems
Hinweis der Redaktion
Myriad social web systems exists, at the heart of such systems is the user: who drives action and contentThese systems differ
Solitary feature sets: Content features produce the best predictive performance on both systems!All features: produces best performance
Twitter:Time in day: no-reply zoneHigher out-degree = more likely to get a replyBoards.ieHigherreferral counts correlated with non-seedsBoth:Lower informativeness correlated with seedsUse language that is familiar to the platform’s usersReadability lower for seeds although harder to see
Twitter: Content quality once again importantBoards.ie: User features more important this time, unlike content beforeSCN: Similar to Boards, user features most importantDigg: focus information of the user most important in this caseTwitter: best model=ContentBoards.ie: best model = content and focusSCN: best model = user and contentDigg: best model = content and focus. Like Boards.ie
In-degree ratio = concentrationPosts Replied ratio = popularityThread initiation ratio = propensity to initiate discussionsBi-directional threads ratio = reciprocity and interactionBi-directional neighbours ratio = reciprocityAverage posts per thread = level of discussionSD of posts per thread = captures variance of discussions
Maintain a mapping between feature and a level (low, mid, high)Enables dynamic derivation of the feature levels
Increase in Elitists and Participants is associated with increased activityUsers who communicate often with other usersIncrease in Taciturns and ignored is associated with decreased activityTaciturns contribute little
Common patterns across all three forums analysedCertain roles more important that others in differing communities:Conversationalists important in commuting and transport and rugby, not in mobile phones and PDAs – conversation not a driving factor in the forumsSupporters found to negatively impact upon activity in forum 411 – again because conversation is not a common action in the community: more interested in support
Increase in Joining Conversationalists and Popular Particiants correlates with increased activityDecrease in supporters and Ignored users correlates with increased activityNo Elitists or Grunts!Lower correlations, behaviour roles that fit well in one system are not the same as another! Different behaviour
Activity increases as the composition reaches a relatively stable settingi.e. little variation and fluctuation in the roles
Composition stability is associated with increased activity in 246 and 411Fluctuation in activity in rugby forum correlated with variation in roles
No Elitists or Grunts!
Best results for 246 – steady increase in activity over timeWorst results for 388 – fluctuation in composition and activity making it hard to perform predictionsCross community patterns are not reliable – idiosyncratic behaviour in each community
895 (Celebrity & Showbiz) and 452 (For Sale: Computer Hardware)Outliers:7: After hours 47: Motors151: Soccer908: Beer Guts and Receding Hair